User'

Autonomous Health Framework

User’s Guide

12c Release 2 (12.2)

E85790-01

May 2017

Autonomous Health Framework User’s Guide, 12c Release 2 (12.2)

E85790-01

Copyright © 2016, 2017, Oracle and/or its affiliates. All rights reserved.

Primary Author: Nirmal Kumar

Contributing Authors: Richard Strohm, Mark Bauer, Douglas Williams, Aparna Kamath, Janet Stern, Subhash

Chandra

Contributors: Girdhari Ghantiyala, Gareth Chapman, Robert Caldwell, Vern Wagman, Mark Scardina, Ankita

Khandelwal, Girish Adiga, Walter Battistella, Jesus guillermo Munoz nunez, Sahil Kumar, Daniel Semler,

Carol Colrain

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means.

Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are

"commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agencyspecific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.

No other rights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications.

It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro

Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.

Contents

Preface

Audience

Documentation Accessibility

Related Documentation

Conventions

Changes in This Release for Oracle Autonomous Health

Framework User’s Guide Release 12c

New Features for Oracle Database 12c Release 2 (12.2)

New Features for Oracle ORAchk and Oracle EXAchk 12.1.0.2.7

New Features for Cluster Health Monitor 12.2.0.1.1

New Features for Oracle Trace File Analyzer 12.2.0.1.1

New Features for Hang Manager

New Features for Memory Guard

New Features for Oracle Database Quality of Service Management 12c Release 2

(12.2.0.1)

1

Introduction to Oracle Autonomous Health Framework

1.1

Oracle Autonomous Health Framework Problem and Solution Space

1.1.1

Availability Issues

1.1.2

Performance Issues

1.2

Components of Oracle Autonomous Health Framework

1.2.1

Introduction to Oracle ORAchk and Oracle EXAchk

1.2.2

Introduction to Cluster Health Monitor

1.2.3

Introduction to Oracle Trace File Analyzer Collector

1.2.4

Introduction to Oracle Cluster Health Advisor

1.2.5

Introduction to Memory Guard

1.2.6

Introduction to Hang Manager

1.2.7

Introduction to Oracle Database Quality of Service (QoS) Management

xvii

xviii

xxi xxi

xxii xxii

xxiii

xv xv xv

xvi

1-4

1-5

1-6

1-6

1-1

1-1

1-3

1-4

1-7

1-8

1-8

iii

2

Analyzing Risks and Complying with Best Practices

2.1

Using Oracle ORAchk and Oracle EXAchk to Automatically Check for Risks and System Health

2.2

Email Notification and Health Check Report Overview

2.2.1

First Email Notification

2.2.2

What does the Health Check Report Contain?

2.2.3

Subsequent Email Notifications

2.3

Configuring Oracle ORAchk and Oracle EXAchk

2.3.1

Deciding Which User Should Run Oracle ORAchk or Oracle EXAchk

2.3.2

Handling of Root Passwords

2.3.3

Configuring Email Notification System

2.4

Using Oracle ORAchk and Oracle EXAchk to Manually Generate Health

Check Reports

2.4.1

Running Health Checks On-Demand

2.4.2

Running Health Checks in Silent Mode

2.4.3

Running On-Demand With or Without the Daemon

2.4.4

Generating a Diff Report

2.4.5

Sending Results by Email

2.5

Managing the Oracle ORAchk and Oracle EXAchk Daemons

2.5.1

Starting and Stopping the Daemon

2.5.2

Configuring the Daemon for Automatic Restart

2.5.3

Setting and Getting Options for the Daemon

2.5.3.1

AUTORUN_SCHEDULE

2.5.3.2

AUTORUN_FLAGS

2.5.3.3

NOTIFICATION_EMAIL

2.5.3.4

collection_retention

2.5.3.5

PASSWORD_CHECK_INTERVAL

2.5.3.6

AUTORUN_INTERVAL

2.5.3.7

Setting Multiple Option Profiles for the Daemon

2.5.3.8

Getting Existing Options for the Daemon

2.5.4

Querying the Status and Next Planned Daemon Run

2.6

Tracking Support Incidents

2.7

Tracking File Attribute Changes and Comparing Snapshots

2.7.1

Using the File Attribute Check With the Daemon

2.7.2

Taking File Attribute Snapshots

2.7.3

Including Directories to Check

2.7.4

Excluding Directories from Checks

2.7.5

Rechecking Changes

2.7.6

Designating a Snapshot As a Baseline

2.7.7

Restricting System Checks

2.7.8

Removing Snapshots

2-22

2-24

2-27

2-27

2-29

2-29

2-30

2-30

2-31

2-31

2-32

2-32

2-33

2-17

2-18

2-19

2-20

2-20

2-21

2-21

2-22

2-12

2-12

2-14

2-15

2-16

2-16

2-16

2-17

2-5

2-5

2-6

2-7

2-8

2-2

2-3

2-3

2-3

iv

2.8

Collecting and Consuming Health Check Data

2.8.1

Selectively Capturing Users During Login

2.8.2

Bulk Mapping Systems to Business Units

2.8.3

Adjusting or Disabling Old Collections Purging

2.8.4

Uploading Collections Automatically

2.8.5

Viewing and Reattempting Failed Uploads

2.8.6

Authoring User-Defined Checks

2.8.7

Finding Which Checks Require Privileged Users

2.8.8

Creating or Editing Incidents Tickets

2.8.8.1

Creating Incident Tickets

Editing Incident Tickets

2.8.9

Viewing Clusterwide Linux Operating System Health Check (VMPScan)

2.9

Locking and Unlocking Storage Server Cells

2.10

Integrating Health Check Results with Other Tools

2.10.2

Integrating Health Check Results with Third-Party Tool

2.10.3

Integrating Health Check Results with Custom Application

2.10.1

Integrating Health Check Results with Oracle Enterprise Manager

2.11

Troubleshooting Oracle ORAchk and Oracle EXAchk

2.11.1

How to Troubleshoot Oracle ORAchk and Oracle EXAchk Issues

2.11.2

How to Capture Debug Output

2.11.3

Remote Login Problems

2.11.4

Permission Problems

2.11.5

Slow Performance, Skipped Checks and Timeouts

3

Collecting Operating System Resources Metrics

3.1

Understanding Cluster Health Monitor Services

3.2

Collecting Cluster Health Monitor Data

3.3

Using Cluster Health Monitor from Enterprise Manager Cloud Control

3-1

3-2

3-3

4

Collecting Diagnostic Data and Triaging, Diagnosing, and Resolving

Issues

4.1

Understanding Oracle Trace File Analyzer

4.1.1

Oracle Trace File Analyzer Architecture

4.1.2

Oracle Trace File Analyzer Automated Diagnostic Collections

4.1.3

Oracle Trace File Analyzer Collector On-Demand Diagnostic Collections

4-2

4-2

4-3

4.1.3.1

Types of On-Demand Collections

4.2

Getting Started with Oracle Trace File Analyzer

4.2.1

Supported Platforms and Product Versions

4.2.2

Oracle Grid Infrastructure Trace File Analyzer Installation

4-5

4-5

4-6

4-6

4-7

2-46

2-46

2-47

2-47

2-48

2-49

2-49

2-50

2-33

2-34

2-36

2-37

2-38

2-40

2-41

2-45

2-52

2-53

2-54

2-55

2-56

2-58

2-58

v

4.2.3

Oracle Database Trace File Analyzer Installation

4.2.4

Securing Access to Oracle Trace File Analyzer

4.2.5

Masking Sensitive Data

4.2.6

Configuring Email Notification Details

4.3

Automatically Collecting Diagnostic Data Using the Oracle Trace File Analyzer

Collector

4.3.1

Managing the Oracle Trace File Analyzer Daemon

4.3.2

Viewing the Status and Configuration of Oracle Trace File Analyzer

4.3.3

Configuring the Host

4.3.4

Configuring the Ports

4.3.5

Configuring SSL and SSL Certificates

4.3.5.1

Configuring SSL/TLS Protocols

4.3.5.2

Configuring Self-Signed Certificates

4.3.5.3

Configuring CA-Signed Certificates

4.3.6

Managing Collections

4.3.6.1

Including Directories

4.3.6.2

Managing the Size of Collections

4.3.7

Managing the Repository

4.3.7.1

Purging the Repository Automatically

4.3.7.2

Purging the Repository Manually

4.4

Analyzing the Problems Identified

4.5

Manually Collecting Diagnostic Data

4.5.1

Running On-Demand Default Collections

4.5.1.1

Adjusting the Time Period for a Collection

4.5.2

Running On-Demand Event-Driven SRDC Collections

4.5.3

Running On-Demand Custom Collections

4.5.3.1

Collecting from Specific Nodes

4.5.3.2

Collecting from Specific Components

4.5.3.3

Collecting from Specific Directories

4.5.3.4

Changing the Collection Name

4.5.3.5

Preventing Copying Zip Files and Trimming Files

4.5.3.6

Performing Silent Collection

4.5.3.7

Preventing Collecting Core Files

4.5.3.8

Collecting Incident Packaging Service Packages

4.6

Analyzing and Searching Recent Log Entries

4.7

Managing Oracle Database and Oracle Grid Infrastructure Diagnostic Data

4.7.1

Managing Automatic Diagnostic Repository Log and Trace Files

4.7.2

Managing Disk Usage Snapshots

4.7.3

Purging Oracle Trace File Analyzer Logs Automatically

4.8

Upgrading Oracle Trace File Analyzer Collector by Applying a Patch Set

Update

4-8

4-9

4-10

4-11

4-29

4-30

4-30

4-30

4-30

4-32

4-32

4-33

4-34

4-34

4-22

4-22

4-24

4-25

4-26

4-27

4-27

4-28

4-17

4-19

4-19

4-19

4-20

4-20

4-21

4-21

4-11

4-12

4-13

4-14

4-15

4-16

4-16

4-17

4-34

vi

4.9

Troubleshooting Oracle Trace File Analyzer 4-35

5

Proactively Detecting and Diagnosing Performance Issues for

Oracle RAC

5.1

Oracle Cluster Health Advisor Architecture

5.2

Monitoring the Oracle Real Application Clusters (Oracle RAC) Environment with Oracle Cluster Health Advisor

5.3

Using Cluster Health Advisor for Health Diagnosis

5.4

Calibrating an Oracle Cluster Health Advisor Model for a Cluster Deployment

5.5

Viewing the Details for an Oracle Cluster Health Advisor Model

5.6

Managing the Oracle Cluster Health Advisor Repository

5.7

Viewing the Status of Cluster Health Advisor

5-2

5-3

5-4

5-6

5-8

5-9

5-10

6

Resolving Memory Stress

6.1

Overview of Memory Guard

6.2

Memory Guard Architecture

6.3

Enabling Memory Guard in Oracle Real Application Clusters (Oracle RAC)

Environment

6.4

Use of Memory Guard in Oracle Real Application Clusters (Oracle RAC)

Deployment

6-1

6-2

6-3

6-4

7

Resolving Database and Database Instance Hangs

7.1

Hang Manager Architecture

7.2

Optional Configuration for Hang Manager

7.3

Hang Manager Diagnostics and Logging

8

Monitoring System Metrics for Cluster Nodes

8.1

Monitoring Oracle Clusterware with Oracle Enterprise Manager

8.2

Monitoring Oracle Clusterware with Cluster Health Monitor

8.3

Using the Cluster Resource Activity Log to Monitor Cluster Resource Failures

9

Monitoring and Managing Database Workload Performance

9.1

What Does Oracle Database Quality of Service (QoS) Management Manage?

9.2

How Does Oracle Database Quality of Service (QoS) Management Work?

9.3

Overview of Metrics

9.4

Benefits of Using Oracle Database Quality of Service (QoS) Management

7-1

7-3

7-4

8-1

8-3

8-3

9-1

9-2

9-3

9-3

vii

A

Oracle ORAchk and Oracle EXAchk Command-Line Options

A.1

Running Generic Oracle ORAchk and Oracle EXAchk Commands

A.2

Controlling the Scope of Checks

A.3

Managing the Report Output

A.4

Uploading Results to Database

A.5

Configuring the Daemon Mode

A.6

Controlling the Behavior of the Daemon

A.7

Tracking File Attribute Changes

B

OCLUMON Command Reference

B.1

oclumon debug

B.2

oclumon dumpnodeview

B.3

oclumon manage

B.4

oclumon version

C

Diagnostics Collection Script

B-1

B-2

B-12

B-14

A-3

A-4

A-5

A-6

A-7

A-8

A-9

D

Managing the Cluster Resource Activity Log

D.1

crsctl query calog

D.2

crsctl get calog maxsize

D.3

crsctl get calog retentiontime

D.4

crsctl set calog maxsize

D.5

crsctl set calog retentiontime

E

chactl Command Reference

E.1

chactl monitor

E.2

chactl unmonitor

E.3

chactl status

E.4

chactl config

E.5

chactl calibrate

E.6

chactl query diagnosis

E.7

chactl query model

E.8

chactl query repository

E.9

chactl query calibration

E.10

chactl remove model

E.11

chactl rename model

E.12

chactl export model

D-1

D-8

D-8

D-9

D-9

E-5

E-7

E-9

E-10

E-2

E-3

E-3

E-5

E-10

E-13

E-13

E-13

viii

E.13

chactl import model

E.14

chactl set maxretention

E.15

chactl resize repository

E-14

E-14

E-15

F

Oracle Trace File Analyzer Command-Line and Shell Options

F.1

Running Administration Commands

F.1.1

tfactl diagnosetfa

F.1.2

tfactl host

F.1.3

tfactl set

F.1.4

tfactl access

F.2

Running Summary and Analysis Commands

F.2.1

tfactl summary

F.2.2

tfactl changes

F.2.3

tfactl events

F.2.4

tfactl analyze

F.2.5

tfactl run

F.2.6

tfactl toolstatus

F.3

Running Diagnostic Collection Commands

F.3.1

tfactl diagcollect

F.3.2

tfactl directory

F.3.3

tfactl ips

F.3.3.1

tfactl ips ADD

F.3.3.2

tfactl ips ADD FILE

F.3.3.3

tfactl ips COPY IN FILE

F.3.3.4

tfactl ips REMOVE

F.3.3.5

tfactl ips REMOVE FILE

F.3.3.6

tfactl ips ADD NEW INCIDENTS PACKAGE

F.3.3.7

tfactl ips GET REMOTE KEYS FILE

F.3.3.8

tfactl ips USE REMOTE KEYS FILE

F.3.3.9

tfactl ips CREATE PACKAGE

F.3.3.10

tfactl ips FINALIZE PACKAGE

F.3.3.11

tfactl ips GENERATE PACKAGE

F.3.3.12

tfactl ips DELETE PACKAGE

F.3.3.13

tfactl ips GET MANIFEST FROM FILE

F.3.3.14

tfactl ips GET METADATA

F.3.3.15

tfactl ips PACK

F.3.3.16

tfactl ips SET CONFIGURATION

F.3.3.17

tfactl ips SHOW CONFIGURATION

F.3.3.18

tfactl ips SHOW PACKAGE

F.3.3.19

tfactl ips SHOW FILES PACKAGE

F-28

F-30

F-30

F-30

F-31

F-31

F-31

F-33

F-25

F-26

F-26

F-27

F-27

F-27

F-28

F-28

F-33

F-33

F-34

F-10

F-11

F-14

F-15

F-16

F-17

F-20

F-22

F-5

F-7

F-7

F-8

F-2

F-3

F-3

F-4

ix

F.3.3.20

tfactl ips SHOW INCIDENTS PACKAGE

F.3.3.21

tfactl ips SHOW PROBLEMS

F.3.3.22

tfactl ips UNPACK FILE

F.3.3.23

tfactl ips UNPACK PACKAGE

F.3.4

tfactl collection

F.3.5

tfactl print

F.3.6

tfactl purge

F.3.7

tfactl managelogs

Index

F-34

F-34

F-35

F-35

F-35

F-35

F-38

F-38

x

List of Figures

2-5

2-6

2-7

2-8

2-1

2-2

2-3

2-4

Oracle Health Check Collections Manager - Administration

Oracle Health Check Collections Manager - Configure Email Server

Oracle Health Check Collections Manager - Notification Job Run status details

Oracle Health Check Collections Manager - Manage Notifications

Oracle Health Check Collections Manager - Sample Email Notification

Oracle Health Check Collections Manager - Sample Diff Report

Incidents Tab

Manage Users, User Roles and assign System to users

2-9

Don’t Capture User Details (When Login)

2-10

Capture User Details (When Login)

2-11

Assign System to Business Unit

2-12

Bulk Mapping

2-13

Upload a mapping XML

2-14

Manage Email Server and Job Details

2-15

Configure Purging

2-16

User-Defined Checks Tab

2-17

User-Defined Checks Tab - Audit Check Type

2-18

User-Defined Checks Tab - Audit Check Type - OS Check

2-19

User-Defined Checks Tab - Available Audit Checks

2-20

User-Defined Checks Tab - Download User-Defined Checks

2-21

Oracle ORAchk - Privileged User

2-22

Clusterwide Linux Operating System Health Check (VMPScan)

2-23

Third-Party Tool Integration

2-24

Compliance Dashboard

3-2

3-3

4-1

4-2

2-25

Compliance Standards

2-26

Compliance Standards Drill-Down

2-27

Skipped Checks

3-1

EMCC - Cluster Health Monitoring

Cluster Health Monitoring - Real Time Data

Cluster Health Monitoring - Historical Data


Automatic Diagnostic Collections

4-3

4-4

5-1

On-Demand Collections

Analysis


2-53

2-53

2-58

3-4

3-5

3-6

4-3

4-4

2-42

2-43

2-44

2-45

2-46

2-48

2-49

2-52

4-5

4-22

5-2

2-35

2-35

2-36

2-36

2-37

2-37

2-38

2-42

2-8

2-9

2-9

2-10

2-11

2-11

2-28

2-35

xi

5-2

6-1

7-1

Cluster Health Advisor Diagnosis HTML Output



5-5

6-2

7-2

xii

List of Tables

A-5

A-6

B-1

B-2

A-1

A-2

A-3

A-4

2-5

4-1

4-2

4-3

2-1

2-2

2-3

2-4

AUTORUN_SCHEDULE

AUTORUN_FLAGS

AUTORUN_INTERVAL

Uploading Collection Results into a Database

Timeout Controlling

Trigger Automatic Event Detection


Component Options

Generic Commands

Scope of Checks

Managing Output


Daemon Options

List of Oracle ORAchk and Oracle EXAchk File Attribute Tracking Options

oclumon debug Command Parameters

oclumon dumpnodeview Command Parameters

F-1

F-2

F-3

F-4

C-1

D-1

D-2

E-1

B-3

B-4

B-5

B-6

oclumon dumpnodeview SYSTEM View Metric Descriptions

oclumon dumpnodeview PROCESSES View Metric Descriptions

oclumon dumpnodeview DEVICES View Metric Descriptions

oclumon dumpnodeview NICS View Metric Descriptions

B-7

B-8

oclumon dumpnodeview FILESYSTEMS View Metric Descriptions

oclumon dumpnodeview PROTOCOL ERRORS View Metric Descriptions

B-9

oclumon dumpnodeview CPUS View Metric Descriptions

B-10

oclumon manage Command Parameters

F-5

F-6

F-7

diagcollection.pl Script Parameters

crsctl query calog Command Parameters

Cluster Resource Activity Log Fields

chactl monitor Command Parameters

Basic TFACTL commands

tfactl diagnosetfa Command Parameters

tfactl set Command Parameters

tfactl access Command Parameters

tfactl analyze Command Parameters

tfactl analyze -type Parameter Arguments

tfactl run Command Parameters

F-2

F-3

F-4

F-6

C-2

D-2

D-3

E-2

B-8

B-9

B-9

B-13

B-5

B-6

B-7

B-8

F-12

F-13

F-14

A-8

A-9

B-2

B-3

A-4

A-5

A-6

A-7

2-19

2-20

2-22

2-50

2-59

4-4

4-24

4-27

xiii

F-8

F-9

tfactl run Analysis Tools Parameters

tfactl run Profiling Tools Parameters

F-10

tfactl toolstatus Output

F-11

tfactl directory Command Parameters

F-12

tfactl ips Command Parameters

F-13

tfactl ips ADD Command Parameters

F-14

tfactl ips ADD FILE Command Parameters

F-15

tfactl ips COPY IN FILE Command Parameters

F-16

tfactl ips REMOVE Command Parameters

F-17

tfactl ips ADD NEW INCIDENTS PACKAGE Command Parameters

F-18

tfactl ips GET REMOTE KEYS FILE Command Parameters

F-19

tfactl ips CREATE PACKAGE Command Parameters

F-20

tfactl ips GENERATE PACKAGE Command Parameters

F-21

tfactl ips DELETE PACKAGE Command Parameters

F-22

tfactl ips GET MANIFEST FROM FILE Command Parameters

F-23

tfactl ips PACK Command Parameters

F-24

tfactl ips SET CONFIGURATION Command Parameters

F-25

tfactl print Command Parameters

F-26

tfactl managelogs Purge Options

F-27

tfactl managelogs Show Options

F-27

F-28

F-28

F-29

F-30

F-31

F-31

F-32

F-14

F-15

F-15

F-21

F-23

F-25

F-26

F-26

F-33

F-36

F-38

F-39

xiv

Preface

Audience

Oracle Autonomous Health Framework User’s Guide explains how to use the Oracle

Autonomous Health Framework diagnostic components.

The diagnostic components include Oracle ORAchk, Oracle EXAchk, Cluster Health

Monitor, Oracle Trace File Analyzer Collector, Oracle Cluster Health Advisor, Memory

Guard, and Hang Manager.

Oracle Autonomous Health Framework User’s Guide also explains how to install and configure Oracle Trace File Analyzer Collector.

This Preface contains these topics:

Topics:

•

Audience

•


•


•

Conventions

Database administrators can use this guide to understand how to use the Oracle

Autonomous Health Framework diagnostic components. This guide assumes that you are familiar with Oracle Database concepts.


For information about Oracle's commitment to accessibility, visit the Oracle

Accessibility Program website at http://www.oracle.com/pls/topic/lookup?

ctx=acc&id=docacc .

Access to Oracle Support

Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/ lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.


For more information, see the following Oracle resources: xv

Preface

Related Topics:

•

Oracle Automatic Storage Management Administrator's Guide

•

Oracle Database 2 Day DBA

•

Oracle Database Concepts

•

Oracle Database Examples Installation Guide

•

Oracle Database Licensing Information

•

Oracle Database New Features Guide

•

Oracle Database Readme

•

Oracle Database Upgrade Guide

•

Oracle Grid Infrastructure Installation and Upgrade Guide

•

Oracle Real Application Clusters Installation Guide

Conventions

The following text conventions are used in this document:

Convention boldface

italic

monospace

Meaning

Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary.

Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values.

Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.

xvi

Changes in This Release for Oracle

Autonomous Health Framework User’s

Guide Release 12c

This preface lists changes in Oracle Autonomous Health Framework for Oracle

Database 12c release 2 (12.2) and release 1 (12.1).

Topics:

•


•

New Features for Oracle ORAchk and Oracle EXAchk 12.1.0.2.7

•


•


•


•


•

New Features for Oracle Database Quality of Service Management 12c Release 2

(12.2.0.1)


These are new features for Oracle Database 12c release 2 (12.2).

•

Oracle Cluster Health Advisor

•

Enhancements to Grid Infrastructure Management Repository (GIMR)


Oracle Cluster Health Advisor is introduced for Oracle Database 12c release 2 (12.2).

Oracle Cluster Health Advisor collects data from Oracle Real Application Clusters

(Oracle RAC) and Oracle RAC One Node databases, and from operating system and hardware resources. Oracle Cluster Health Advisor then advises how to fix database or performance issues.

Enhancements to Grid Infrastructure Management Repository (GIMR)

Oracle Grid Infrastructure deployment now supports a global off-cluster Grid

Infrastructure Management Repository (GIMR).

xvii

Changes in This Release for Oracle Autonomous Health Framework User’s Guide Release 12c

New Features for Oracle ORAchk and Oracle EXAchk

12.1.0.2.7

These are new features for Oracle ORAchk and Oracle EXAchk 12.1.0.2.7.

•

Simplified Enterprise-Wide Data Configuration and Maintenance

•

Tracking Changes to File Attributes

•

Find Health Checks that Require Privileged Users to Run

•

Support for Broader Range of Oracle Products

•

Easier to Run Oracle EXAchk on Oracle Exadata Storage Servers

•

New Health Checks for Oracle ORAchk and Oracle EXAchk

Simplified Enterprise-Wide Data Configuration and Maintenance

This release contains several changes that simplify system and user configuration and maintenance, and upload health check collection results.

•


•

Selectively Capturing Users During Log In

•

Configuring Details for Upload of Health Check Collection Results

•


•

Managing Oracle Health Check Collection Purges


Oracle Health Check Collections Manager provides an XML bulk upload option so that you can quickly map many systems to business units.

Related Topics:

•



Selectively Capturing Users During Log In

By default, Oracle Health Check Collections Manager captures details of the users who log in using LDAP authentication and assigns Oracle Health Check Collections

Manager role to the users.

Related Topics:

•


Configure Oracle Health Check Collections Manager to capture user details and assign the users Oracle Health Check Collections Manager roles.

xviii


Configuring Details for Upload of Health Check Collection Results

Configure Oracle ORAchk and Oracle EXAchk to automatically upload health check collection results to the Oracle Health Check Collections Manager database.

Related Topics:

•


Configure Oracle ORAchk and Oracle EXAchk to upload check results automatically to the Oracle Health Check Collections Manager database.


Use the new option

-checkfaileduploads

to find failed uploads.

Related Topics:

•


Configure Oracle ORAchk and Oracle EXAchk to display and reattempt to upload the failed uploads.

Managing Oracle Health Check Collection Purges

Oracle Health Check Collections Manager now by default purges collections that are older than three months.

Related Topics:

•


Modify or disable the purge schedule for Oracle Health Check Collections

Manager collection data.

Tracking Changes to File Attributes

Use the Oracle ORAchk and Oracle EXAchk option

–fileattr

to track changes to file attributes.

If run with the

–fileattr

option, then Oracle ORAchk and Oracle EXAchk search all files within Oracle Grid Infrastructure and Oracle Database homes by default. Also, specify the list of directories, subdirectories, and files to monitor, and then compare snapshots for changes.

Related Topics:

•


Use the Oracle ORAchk and Oracle EXAchk

-fileattr

option and command flags to record and track file attribute settings, and compare snapshots.

•



-fileattr


xix


Find Health Checks that Require Privileged Users to Run

Use the new privileged user filter to identify health checks that require a privileged user to run.

Related Topics:

•


Use the Privileged User filter in the Health Check Catalogs to find health checks that must be run by privileged users, such as root

.

Support for Broader Range of Oracle Products

Health check support has been broadened to include Linux operating system health checks (Oracle ORAchk only), External ZFS Storage Appliance health checks (Oracle

EXAchk on Exalogic only), and Oracle Enterprise Manager Cloud Control 13.1.

Related Topics:

•


On Linux systems, view a summary of the VMPScan report in the Clusterwide

Linux Operating System Health Check (VMPScan) section of the Health Check report.

Easier to Run Oracle EXAchk on Oracle Exadata Storage Servers

Run Oracle EXAchk from Oracle Exadata storage servers without SSH connectivity from the database server to the storage server.

Related Topics:

•


Beginning with version 12.1.0.2.7, use Oracle EXAchk to lock and unlock storage server cells.

New Health Checks for Oracle ORAchk and Oracle EXAchk

New health checks have been included for:

• Oracle Exadata

• Oracle SuperCluster

• Oracle Exalogic

• Oracle ZFS Storage

• Oracle Enterprise Linux

• Oracle Solaris Cluster

• Oracle Database and Oracle Grid Infrastructure

• Oracle Enterprise Manager Cloud Control Oracle Management Service (OMS) and

Repository

Refer to My Oracle Support note 1268927.2, click the Health Check Catalog tab to download and view the list of health checks.

xx


Related Topics:

• https://support.oracle.com/rs?type=doc&id=1268927.2


Two new parameters are added to the oclumon dumpnodeview

command:

• oclumon dumpnodeview —format csv

: This option provides CSV format output mode for dumpnodeview.

• oclumon dumpnodeview –procag

: This option provides output of node view processes aggregated by category.

Related Topics:

•


Use the command-line tool to query the Cluster Health Monitor repository to display node-specific metrics for a specific time period.


Oracle Trace File Analyzer includes the following new features in release 12.2.0.1.1:

• Oracle Trace File Analyzer runs an automatic purge every 60 minutes to delete logs that are older than 30 days.

• Manage Automatic Diagnostic Repository (ADR) log and trace files by using the managelogs

command.

• Oracle Trace File Analyzer now automatically monitors disk usage and records snapshots.

• Oracle Trace File Analyzer now provides event-driven Support Service Request

Data Collection (SRDC) collections.

• Oracle Trace File Analyzer integrates Incident Packaging Service (IPS), and can now run IPS to show incidents, problems, and packages. IPS packages can also be included in diagnostic collection with the option to manipulate them before packaging.

• Oracle Trace File Analyzer Built on Java Runtime Environment (JRE) 1.8.

Oracle Trace File Analyzer was built on Java Runtime Environment (JRE) 1.8 for this release. It uses the latest Java features. Bash shell is no longer required for

Oracle Trace File Analyzer. Because Oracle Trace File Analyzer runs on Java instead of as a shell script, it is now supported on Microsoft Windows platforms.

If you plan to use Oracle Trace File Analyzer, then JRE is now a requirement. JRE ships with Oracle Database and Oracle Grid Infrastructure. JRE is also included in the Oracle Trace File Analyzer Database Support Tools Bundle from My Oracle

Support Note 1513912.2.

Related Topics:

•


Review the supported platforms and product versions for Oracle Trace File

Analyzer and Oracle Trace File Analyzer Collector.

xxi


•


Oracle Trace File Analyzer is automatically configured as part of the Oracle Grid

Infrastructure configuration when running root.sh

or rootupgrade.sh

.

•


Use the diagcollect -srdc

option to collect diagnostics needed for an Oracle

Support Service Request Data Collection (SRDC).

•


Use these tfactl

commands to manage log file purge policy for Oracle Trace

Analyzer log files.

•


Use the managelogs

command to manage Automatic Diagnostic Repository log and trace files.

•


Use tfactl

commands to manage Oracle Trace File Analyzer disk usage snapshots.

•

tfactl diagcollect

Use the tfactl diagcollect

command to perform on-demand diagnostic collection.



Hang Manager includes the following new features:

• Sensitivity Setting

Adjust the threshold period that Hang Manager waits to confirm if a session is hung by setting the sensitivity

parameter.

• Number of Trace Files Setting

Adjust the number of trace files that can be generated within the trace file sets by setting the base_file_set_count

parameter.

• Size of Trace File Setting

Adjust the size (in bytes) of trace files by setting the base_file_size_limit

parameter.

Related Topics:

•


You can adjust the sensitivity, and control the size and number of the log files used by Hang Manager.


• Alert Notifications When Memory Pressure is Detected

Memory Guard now sends alert notifications if Memory Guard finds any server at risk. Find those notifications in the audit logs.

xxii


Related Topics:

•


Deployment

Memory Guard autonomously detects and monitors Oracle Real Application

Clusters (Oracle RAC) or Oracle RAC One Node databases when they are open.

New Features for Oracle Database Quality of Service

Management 12c Release 2 (12.2.0.1)

• New qosmserver

to Replace OC4J J2EE Container

In earlier releases, Oracle Database Quality of Service Management Server was deployed in an OC4J J2EE container. OC4J J2EE is not supported on the latest versions of Java, and had a greater resource footprint than needed by Oracle

Database Quality of Service Management. A profiled version of Tomcat, known as the qosmserver

, replaces the OC4J J2EE container.

• Full Support for Administrator-Managed and Multitenant Oracle RAC Databases

In Oracle Database 12c release 1 (12.1), Oracle Database Quality of Service

Management supported administrator-managed Oracle RAC and Oracle RAC One

Node databases in its Measure-Only and Monitor modes. In this release, you can use Oracle Database Quality of Service Management support in Management mode for administrator-managed Oracle RAC databases and Multitenant Oracle

RAC Databases. However, Oracle Database Quality of Service Management cannot expand or shrink the number of instances by changing the server pool size for administrator-managed databases because administrator-managed databases do not run in server pools. Oracle Enterprise Manager Cloud Control supports this new feature in the Oracle Database Quality of Service Management pages.

xxiii

1

Introduction to Oracle Autonomous Health

Framework

Oracle Autonomous Health Framework is a collection of components that analyzes the diagnostic data collected, and proactively identifies issues before they affect the health of your clusters or your Oracle Real Application Clusters (Oracle RAC) databases.

Most of the Oracle Autonomous Health Framework components are already available in Oracle Database 12c release 1 (12.1). In Oracle Database 12c release 2 (12.2), the output of several components is consolidated in the Grid Infrastructure

Management Repository (GIMR) and analyzed in real time to detect problematic patterns on the production clusters.

Topics:

•


Oracle Autonomous Health Framework assists with monitoring, diagnosing, and preventing availability and performance issues.

•


This section describes the diagnostic components that are part of Oracle

Autonomous Health Framework.

1.1 Oracle Autonomous Health Framework Problem and

Solution Space

Oracle Autonomous Health Framework assists with monitoring, diagnosing, and preventing availability and performance issues.

System administrators can use most of the components in Oracle Autonomous Health

Framework interactively during installation, patching, and upgrading. Database administrators can use Oracle Autonomous Health Framework to diagnose operational runtime issues and mitigate the impact of these issues.

•

Availability Issues

Availability issues are runtime issues that threaten the availability of software stack.

•

Performance Issues

Performance issues are runtime issues that threaten the performance of the system.

1.1.1 Availability Issues

Availability issues are runtime issues that threaten the availability of software stack.

Availability issues can result from either software issues (Oracle Database, Oracle

Grid Infrastructure, operating system) or the underlying hardware resources (CPU,

Memory, Network, Storage).

1-1

Chapter 1


The components within Oracle Autonomous Health Framework address the following availability issues:

Examples of Server Availability Issues

Server availability issues can cause a server to be evicted from the cluster and shut down all the database instances that are running on the server.

Examples of such issues are:

• Issue: Memory stress caused by a server running out of free physical memory, results in the operating system

Swapper

process to run for extended periods of time moving memory to disk. Swapping prevents time-critical cluster processes from running and eventually causing the node to be evicted.

Solution: Memory Guard detects the memory stress in advance and causes work to be drained to free up memory.

• Issue: Network congestion on the private interconnect can cause time-critical internode or storage I/O to have excessive latency or dropped packets. This type of failure typically builds up and can be detected early, and corrected or relieved.

Solution: If a change in the server configuration causes this issue, then Cluster

Verification Utility (CVU) detects it if the issue persists for more than an hour.

However, Oracle Cluster Health Advisor detects the issue within minutes and presents corrective actions.

• Issue: Network failures on the private interconnect caused by a pulled cable or failed network interface card (NIC) can immediately result in evicted nodes.

Solution: Although these types of network failures cannot be detected early, the cause can be narrowed down by using Cluster Health Monitor and Oracle Trace

File Analyzer to pinpoint the time of the failure and the network interfaces involved.

Examples of Database Availability Issues

Database availability issues can cause an Oracle database or one of the instances of the database to become unresponsive and thus unavailable to users.

Examples of such issues are:

• Issue: Runaway queries or hangs can deny critical database resources such as locks, latches, or CPU to other sessions. Denial of critical database resources results in database or an instance of a database being non-responsive to applications.

Solution: Hang Manager detects and automatically resolves these types of hangs.

Also, Oracle Cluster Health Advisor detects, identifies, and notifies the database administrator of such hangs and provides an appropriate corrective action.

• Issue: Denial-of-service (DoS) attacks, vulnerabilities, or simply software bugs can cause a database or a database instance to be unresponsive.

Solution: Proactive recommendations of known issues and their resolutions provided by Oracle ORAchk can prevent such occurrences. If these issues are not prevented, then automatic collection of logs by Oracle Trace File Analyzer, in addition to data collected by Cluster Health Monitor, can speed up the correction of these issues.

• Issue: Configuration changes can cause database outages that are difficult to troubleshoot. For example, incorrect permissions on the oracle.bin

file can prevent session processes from being created.

1-2

Chapter 1


Solution: Use Cluster Verification Utility and Oracle ORAchk to speed up identification and correction of these types of issues. You can generate a diff report using Oracle ORAchk to see a baseline comparison of two reports and a list of differences. You can also view configuration reports created by Cluster

Verification Utility to verify whether your system meets the criteria for an Oracle installation.

1.1.2 Performance Issues

Performance issues are runtime issues that threaten the performance of the system.

Performance issues can result from either software issues (bugs, configuration problems, data contention, and so on) or client issues (demand, query types, connection management, and so on).

Server and database performance issues are intertwined and difficult to separate. It is easier to categorize them by their origin: database server or client.

Examples of Database Server Performance Issues

• Issue: Deviations from best practices in configuration can cause database server performance issues.

Solution: Oracle ORAchk detects configuration issues when Oracle ORAchk runs periodically and notifies the database administrator of the appropriate corrective settings.

• Issue: Bottlenecked resources or poorly constructed SQL statements can cause database server performance issues.

Solution: Oracle Database Quality of Service (QoS) Management flags these issues and generates notifications when the issues put Service Level Agreements

(SLAs) at risk. Oracle Cluster Health Advisor detects when the issues exceed normal operating conditions and notifies the database administrator with corrective actions.

• Issue: A session can cause other sessions to slow down waiting for the blocking session to release its resource or complete its work.

Solution: Hang Manager detects these chains of sessions and automatically kills the root holder session to relieve the bottleneck.

• Issue: Unresolved known issues or unpatched bugs can cause database server performance issues.

Solution: These issues can be detected through the automatic Oracle ORAchk reports and flagged with associated patches or workarounds. Oracle ORAchk is regularly enhanced to include new critical issues, either in existing products or in new product areas.

Examples of Performance Issues Caused by Database Client

• Issue: When a server is hosting more database instances than its resources and client load can manage, performance suffers because of waits for CPU, I/O, or memory.

Solution: Oracle ORAchk and Oracle Database QoS Management detect when these issues are the result of misconfiguration such as oversubscribing of CPUs, memory, or background processes. Oracle ORAchk and Oracle Database QoS

Management notify you with corrective actions.

1-3

Chapter 1


• Issue: Misconfigured parameters such as SGA and PGA allocation, number of sessions or processes, CPU counts, and so on, can cause database performance degradation.

Solution: Oracle ORAchk and Oracle Cluster Health Advisor detect the settings and consequences respectively and notify you automatically with recommended corrective actions.

• Issue: A surge in client connections can exceed the server or database capacity, causing timeout errors and other performance problems.

Solution: Oracle Database QoS Management and Oracle Cluster Health Advisor automatically detect the performance degradation. Also, Oracle Database QoS

Management and Oracle Cluster Health Advisor notify you with corrective actions to relieve the bottleneck and restore performance.

1.2 Components of Oracle Autonomous Health Framework

This section describes the diagnostic components that are part of Oracle Autonomous

Health Framework.

•


Oracle ORAchk and Oracle EXAchk provide a lightweight and non-intrusive health check framework for the Oracle stack of software and hardware components.

•


Cluster Health Monitor is a component of Oracle Grid Infrastructure, which continuously monitors and stores Oracle Clusterware and operating system resources metrics.

•


Oracle Trace File Analyzer Collector is a utility for targeted diagnostic collection that simplifies diagnostic data collection for Oracle Clusterware, Oracle Grid

Infrastructure, and Oracle Real Application Clusters (Oracle RAC) systems, in addition to single instance, non-clustered databases.

•


Oracle Cluster Health Advisor continuously monitors cluster nodes and Oracle

RAC databases for performance and availability issue precursors to provide early warning of problems before they become critical.

•


Memory Guard is an Oracle Real Application Clusters (Oracle RAC) environment feature to monitor the cluster nodes to prevent node stress caused by the lack of memory.

•


Hang Manager is an Oracle Real Application Clusters (Oracle RAC) environment feature that autonomously resolves hangs and keeps the resources available.

•


Oracle Database Quality of Service (QoS) Management manages the resources that are shared across applications.

1.2.1 Introduction to Oracle ORAchk and Oracle EXAchk


1-4

Chapter 1


Oracle ORAchk and Oracle EXAchk:

• Automates risk identification and proactive notification before your business is impacted

• Runs health checks based on critical and reoccurring problems

• Presents high-level reports about your system health risks and vulnerabilities to known issues

• Enables you to drill-down specific problems and understand their resolutions

• Enables you to schedule recurring health checks at regular intervals

• Sends email notifications and diff reports while running in daemon mode

• Integrates the findings into Oracle Health Check Collections Manager and other tools of your choice

• Runs in your environment with no need to send anything to Oracle

You have access to Oracle ORAchk and Oracle EXAchk as a value add-on to your existing support contract. There is no additional fee or license required to run Oracle

ORAchk and Oracle EXAchk.

Use Oracle EXAchk for Oracle Engineered Systems except for Oracle Database

Appliance. For all other systems, use Oracle ORAchk.

Run health checks for Oracle products using the command-line options.

Related Topics:

•

Oracle ORAchk and Oracle EXAchk Command-Line Options

Most command-line options apply to both Oracle ORAchk and Oracle EXAchk.

Use the command options to control the behavior of Oracle ORAchk and Oracle

EXAchk.

•

Analyzing Risks and Complying with Best Practices

Use configuration audit tools Oracle ORAchk and Oracle EXAchk to assess your

Oracle Engineered Systems and non-Engineered Systems for known configuration problems and best practices.

1.2.2 Introduction to Cluster Health Monitor


Enabled by default, Cluster Health Monitor:

• Assists node eviction analysis

• Logs all process data locally

• Enables you to define pinned processes

• Listens to CSS and GIPC events

• Categorizes processes by type

• Supports plug-in collectors such as traceroute, netstat, ping, and so on

• Provides CSV output for ease of analysis

1-5

Chapter 1


Cluster Health Monitor serves as a data feed for other Oracle Autonomous Health

Framework components such as Oracle Cluster Health Advisor and Oracle Database

Quality of Service Management.

Related Topics:

•

Collecting Operating System Resources Metrics

Use Cluster Health Monitor to collect diagnostic data to analyze performance degradation or failures of critical operating system resources.

1.2.3 Introduction to Oracle Trace File Analyzer Collector

Oracle Trace File Analyzer Collector is a utility for targeted diagnostic collection that simplifies diagnostic data collection for Oracle Clusterware, Oracle Grid Infrastructure, and Oracle Real Application Clusters (Oracle RAC) systems, in addition to single instance, non-clustered databases.

Enabled by default, Oracle Trace File Analyzer:

• Provides comprehensive first failure diagnostics collection

• Efficiently collects, packages, and transfers diagnostic data to Oracle Support

• Reduces round trips between customers and Oracle

Oracle Trace File Analyzer Collector and Oracle Trace File Analyzer reduce the time required to obtain the correct diagnostic data, which eventually saves your business money.

Related Topics:

•

Collecting Diagnostic Data and Triaging, Diagnosing, and Resolving Issues

Use Oracle Trace File Analyzer to collect comprehensive diagnostic data that saves your time and money.

1.2.4 Introduction to Oracle Cluster Health Advisor

Oracle Cluster Health Advisor continuously monitors cluster nodes and Oracle RAC databases for performance and availability issue precursors to provide early warning of problems before they become critical.

Oracle Cluster Health Advisor is integrated into Oracle Enterprise Manager Cloud

Control (EMCC) Incident Manager.

Oracle Cluster Health Advisor does the following:

• Detects node and database performance problems

• Provides early-warning alerts and corrective action

• Supports on-site calibration to improve sensitivity

In Oracle Database 12c release 2 (12.2.0.1), Oracle Cluster Health Advisor supports the monitoring of two critical subsystems of Oracle Real Application Clusters (Oracle

RAC): the database instance and the host system. Oracle Cluster Health Advisor determines and tracks the health status of the monitored system. It periodically samples a wide variety of key measurements from the monitored system.

Over a hundred database and cluster node problems have been modeled, and the specific operating system and Oracle Database metrics that indicate the development or existence of these problems have been identified. This information is used to

1-6

Chapter 1

Components of Oracle Autonomous Health Framework construct a trained, calibrated model that is based on a normal operational period of the target system.

Oracle Cluster Health Advisor runs an analysis multiple times a minute. Oracle Cluster

Health Advisor estimates an expected value of an observed input based on the default model. Oracle Cluster Health Advisor then performs anomaly detection for each input based on the difference between observed and expected values. If sufficient inputs associated with a specific problem are abnormal, then Oracle Cluster Health Advisor raises a warning and generates an immediate targeted diagnosis and corrective action.

Oracle Cluster Health Advisor models are conservative to prevent false warning notifications. However, the default configuration may not be sensitive enough for critical production systems. Therefore, Oracle Cluster Health Advisor provides an onsite model calibration capability to use actual production workload data to form the basis of its default setting and increase the accuracy and sensitivity of node and database models.

Oracle Cluster Health Advisor stores the analysis results, along with diagnosis information, corrective action, and metric evidence for later triage, in the Grid

Infrastructure Management Repository (GIMR). Oracle Cluster Health Advisor also sends warning messages to Enterprise Manager Cloud Control using the Oracle

Clusterware event notification protocol.

You can also use Oracle Cluster Health Advisor to diagnose and triage past problems.

You specify the past dates through Oracle Enterprise Manager Cloud Control (EMCC)

Incident Manager or through the command-line interface CHACTL. Manage the capability of Oracle Cluster Health Advisor to review past problems by configuring the retention setting for Oracle Cluster Health Advisor's tablespace in the Grid

Infrastructure Management Repository (GIMR). The default retention period is 72 hours.

Related Topics:

•

Proactively Detecting and Diagnosing Performance Issues for Oracle RAC

Oracle Cluster Health Advisor provides system and database administrators with early warning of pending performance issues, and root causes and corrective actions for Oracle RAC databases and cluster nodes.

1.2.5 Introduction to Memory Guard


Enabled by default, Memory Guard:

• Analyzes over-committed memory conditions once in every minute

• Issues alert if any server is at risk

• Protects applications by automatically closing the server to new connections

• Stops all CRS-managed services transactionally on the server

• Re-opens server to connections once the memory pressure has subsided

Enterprise database servers can use all available memory due to too many open sessions or runaway workloads. Running out of memory can result in failed

1-7

Chapter 1

Components of Oracle Autonomous Health Framework transactions or, in extreme cases, a restart of the node and the loss of availability of resources for your applications.

Memory Guard autonomously collects metrics on memory of every node from Cluster

Health Monitor to determine if the nodes have insufficient memory. If the memory is insufficient, then Memory Guard prevents new database sessions from being created allowing the existing workload to complete and free their memory. New sessions are started automatically when the memory stress is relieved.

Related Topics:

•


Memory Guard continuously monitors and ensures the availability of cluster nodes by preventing the nodes from being evicted when the nodes are stressed due to lack of memory.

1.2.6 Introduction to Hang Manager


Enabled by default, Hang Manager:

• Reliably detects database hangs and deadlocks

• Autonomously resolves database hangs and deadlocks

• Supports Oracle Database QoS Performance Classes, Ranks, and Policies to maintain SLAs

• Logs all detections and resolutions

• Provides SQL interface to configure sensitivity (Normal/High) and trace file sizes

A database hangs when a session blocks a chain of one or more sessions. The blocking session holds a resource such as a lock or latch that prevents the blocked sessions from progressing. The chain of sessions has a root or a final blocker session, which blocks all the other sessions in the chain. Hang Manager resolves these issues autonomously by detecting and resolving the hangs.

Related Topics:

•

Resolving Database and Database Instance Hangs

Hang Manager preserves the database performance by resolving hangs and keeping the resources available.

1.2.7 Introduction to Oracle Database Quality of Service (QoS)

Management


Oracle Database Quality of Service (QoS) Management:

• Requires 12.1.0.2+ Oracle Grid Infrastructure

• Delivers Key Performance Indicators cluster-wide dashboard

• Phase in with Measure, Monitor, then Management Modes

1-8

Chapter 1


Oracle Database Quality of Service (QoS) Management adjusts the system configuration to keep the applications running at the performance levels needed by your business.

Many companies are consolidating and standardizing their data center computer systems. Instead of using individual servers for each application, the companies run multiple applications on clustered databases. In addition, migration of applications to the Internet has introduced the problem of managing an open workload. An open workload is subject to demand surges that can overload a system. Over loading a system results in a new type of application failure that cannot be fully anticipated or planned for. To keep the applications available and performing within their target service levels in this type of environment, you must:

• Pool resources

• Have management tools that detect performance bottlenecks in real time

• Reallocate resources to meet the change in demand

Oracle Database QoS Management responds gracefully to changes in system configuration and demand, thus avoiding more oscillations in the performance levels of your applications.

Oracle Database QoS Management monitors the performance of each work request on a target system. Oracle Database QoS Management starts to track a work request from the time a work request tries to establish a connection to the database using a database service. The time required to complete a work request or the response time is the time from when the request for data was initiated and when the data request is completed. The response time is also known as the end-to-end response time, or

round-trip time. By accurately measuring the two components of response time,

Oracle Database QoS Management quickly detects bottlenecks in the system. Oracle

Database QoS Management then suggests reallocating resources to relieve a

bottleneck, thus preserving or restoring service levels.

Oracle Database QoS Management manages the resources on your system so that:

• When sufficient resources are available to meet the demand, business-level performance requirements for your applications are met, even if the workload changes

• When sufficient resources are not available to meet the demand, Oracle Database

QoS Management attempts to satisfy the more critical business performance requirements at the expense of less critical performance requirements

Related Topics:

•

Monitoring and Managing Database Workload Performance

Oracle Database Quality of Service (QoS) Management is an automated, policybased product that monitors the workload requests for an entire system.

1-9

2

Analyzing Risks and Complying with Best

Practices

Use configuration audit tools Oracle ORAchk and Oracle EXAchk to assess your

Oracle Engineered Systems and non-Engineered Systems for known configuration problems and best practices.

This chapter describes how to use Oracle ORAchk or Oracle EXAchk and contains the following sections:

Topics:

•

Using Oracle ORAchk and Oracle EXAchk to Automatically Check for Risks and

System Health

Oracle recommends that you use the daemon process to schedule recurring health checks at regular intervals.

•


The following sections provide a brief overview about email notifications and sections of the HTML report output.

•


To configure Oracle ORAchk and Oracle EXAchk, use the procedures explained in this section.

•

Using Oracle ORAchk and Oracle EXAchk to Manually Generate Health Check

Reports

This section explains the procedures to manually generate health check reports.

•


This section explains the procedures to manage Oracle ORAchk and Oracle

EXAchk daemons.

•


The Incidents tab gives you a complete system for tracking support incidents.

•



-fileattr


•


Oracle Health Check Collections Manager for Oracle Application Express 4.2

provides you an enterprise-wide view of your health check collection data.

•



•


Integrate health check results from Oracle ORAchk and Oracle EXAchk into

Enterprise Manager and other third-party tools.

•


To troubleshoot and fix Oracle ORAchk and Oracle EXAchk issues, follow the steps explained in this section.

2-1

Chapter 2

Using Oracle ORAchk and Oracle EXAchk to Automatically Check for Risks and System Health

Related Topics:

•



2.1 Using Oracle ORAchk and Oracle EXAchk to

Automatically Check for Risks and System Health


Configure the daemon to:

• Schedule recurring health checks at regular interval

• Send email notifications when the health check runs complete, clearly showing any differences since the last run

• Purge collection results after a pre-determined period

• Check and send email notification about stale passwords

• Store multiple profiles for automated health check runs

• Restart automatically if the server or node where it is running restarts

Note:

While running, the daemon answers all the prompts required by subsequent on-demand health checks.

To run on-demand health checks, do not use the daemon process started by others. Run on-demand health checks within the same directory where you have started the daemon.

If you change the system configuration such as adding or removing servers or nodes, then restart the daemon.

Related Topics:

•


Set the daemon options before you start the daemon. Reset the daemon options anytime after starting the daemon.

•


Start and stop the daemon and force the daemon to stop a health check run.

•


Query the status and next automatic run schedule of the running daemon.

•


By default, you must manually restart the daemon if you restart the server or node on which the daemon is running.

2-2

Chapter 2


2.2 Email Notification and Health Check Report Overview

The following sections provide a brief overview about email notifications and sections of the HTML report output.

•

First Email Notification

After completing health check runs, the daemon emails the assessment report as an HTML attachment to all users that you have specified in the

NOTIFICATION_EMAIL list.

•

What does the Health Check Report Contain?

Health check reports contain the health status of each system grouped under different sections of the report.

•

Subsequent Email Notifications

For the subsequent health check runs after the first email notification, the daemon emails the summary of differences between the most recent runs.

Related Topics:

•


The diff report attached to the previous email notification shows a summary of differences between the most recent runs.

2.2.1 First Email Notification

After completing health check runs, the daemon emails the assessment report as an

HTML attachment to all users that you have specified in the

NOTIFICATION_EMAIL

list.

2.2.2 What does the Health Check Report Contain?

Health check reports contain the health status of each system grouped under different sections of the report.

The HTML report output contains the following:

• Health score

• Summary of health check runs

• Table of contents

• Controls for report features

• Findings

• Recommendations

Details of the report output are different on each system. The report is dynamic, and therefore the tools display certain sections only if applicable.

System Health Score and Summary

System Health Score and Summary report provide:

• A high-level health score based on the number of passed or failed checks

• A summary of health check run includes:

– Name, for example, Cluster Name

2-3

Chapter 2


– Version of the operating system kernel

– Path, version, name of homes, for example, CRS, DB, and EM Agent

– Version of the component checked, for example, Exadata

– Number of nodes checked, for example, database server, storage servers,

InfiniBand switches

– Version of Oracle ORAchk and Oracle EXAchk

– Name of the collection output

– Date and time of collection

– Duration of the check

– Name of the user who ran the check, for example, root

– How long the check is valid

Table of Contents and Report Feature

The Table of Contents section provides links to major sections in the report:

• Database Server

• Storage Server

• InfiniBand Switch

• Cluster Wide

• Maximum Availability Architecture (MAA) Scorecard

• Infrastructure Software and Configuration Summary

• Findings needing further review

• Platinum Certification

• System-wide Automatic Service Request (ASR) health check

• Skipped Checks

• Top 10 Time Consuming Checks

The Report Feature section enables you to:

• Filter checks based on their statuses

• Select the regions

• Expand or collapse all checks

• View check IDs

• Remove findings from the report

• Get a printable view

Report Findings

The Report Findings section displays the result of each health check grouped by technology components, such as Database Server, Storage Server, InfiniBand Switch, and Cluster Wide.

Each section shows:

• Check status (

FAIL

,

WARNING

,

INFO

, or

PASS

)

2-4

Chapter 2


• Type of check

• Check message

• Where the check was run

• Link to expand details for further findings and recommendation

Click View for more information about the health check results and the recommendations.

• What to do to solve the problem

• Where the recommendation applies

• Where the problem does not apply

• Links to relevant documentation or My Oracle Support notes

• Example of data on which the recommendation is based

Maximum Availability Architecture (MAA) Score Card

Maximum Availability Architecture (MAA) Score Card displays the recommendations for the software installed on your system.

The details include:

• Outage Type

• Status of the check

• Description of the problem

• Components found

• Host location

• Version of the components compared to the recommended version

• Status based on comparing the version found to the recommended version

2.2.3 Subsequent Email Notifications

For the subsequent health check runs after the first email notification, the daemon emails the summary of differences between the most recent runs.

Specify a list of comma-delimited email addresses in the

NOTIFICATION_EMAIL

option.

The email notification contains:

• System Health Score of this run compared to the previous run

• Summary of number of checks that were run and the differences between runs

• Most recent report result as attachment

• Previous report result as attachment

• Diff report as attachment

2.3 Configuring Oracle ORAchk and Oracle EXAchk

To configure Oracle ORAchk and Oracle EXAchk, use the procedures explained in this section.

2-5

Chapter 2


•

Deciding Which User Should Run Oracle ORAchk or Oracle EXAchk

Run health checks as root

. Also, run health checks as the Oracle Database home owner or the Oracle Grid Infrastructure home owner.

•

Handling of Root Passwords

Handling of root

passwords depends on whether you have installed the Expect utility.

•

Configuring Email Notification System

Oracle Health Check Collections Manager provides an email notification system that users can subscribe to.

2.3.1 Deciding Which User Should Run Oracle ORAchk or Oracle

EXAchk

Run health checks as root

. Also, run health checks as the Oracle Database home owner or the Oracle Grid Infrastructure home owner.

Most health checks do not require root

access. However, you need root

privileges to run a subset of health checks.

To run root

privilege checks, Oracle ORAchk uses the script root_orachk.sh

and

Oracle EXAchk uses the script root_exachk.sh

.

By default, the root_orachk.sh

and root_exachk.sh

scripts are created in the temporary directory, that is,

$HOME

used by Oracle ORAchk and Oracle EXAchk. Change the temporary directory by setting the environment variable

RAT_TMPDIR

.

For security reasons, create the root

scripts outside of the standard temporary directory in a custom directory.

To decide which user to run Oracle ORAchk and Oracle EXAchk:

1.

Specify the custom directory using the

RAT_ROOT_SH_DIR

environment variable.

2.

export RAT_ROOT_SH_DIR=/orahome/oradb/

Specify a location for sudo

remote access.

3.

export RAT_ROOT_SH_DIR=/mylocation

Add an entry in the

/etc/sudoers

file.

oracle ALL=(root) NOPASSWD:/mylocation/root_orachk.sh

Note:

Specify full paths for the entries in the

/etc/sudoers

file. Do not use environment variables.

4.

(recommended) Run Oracle ORAchk and Oracle EXAchk as root

.

Use root

user credentials to run Oracle ORAchk and Oracle EXAchk.

The Oracle ORAchk and Oracle EXAchk processes that run as root

, perform user lookups for the users who own the Oracle Database home and Oracle Grid

Infrastructure home. If root

access is not required, then the Oracle ORAchk and

Oracle EXAchk processes use the su

command to run health checks as the

2-6

Chapter 2


5.

applicable Oracle Database home user or Oracle Grid Infrastructure home user.

Accounts with lower privileges cannot have elevated access to run health checks that require root

access.

Running health checks as root

has advantages in role-separated environments or environments with more restrictive security.

Run Oracle ORAchk and Oracle EXAchk as Oracle Database home owner or

Oracle Grid Infrastructure home owner:

Use Oracle Database home owner or Oracle Grid Infrastructure home owner credentials to run Oracle ORAchk and Oracle EXAchk.

The user who runs Oracle ORAchk and Oracle EXAchk must have elevated access as root

to run health checks that need root

access.

Running health checks as Oracle Database home owner or Oracle Grid

Infrastructure home owner requires multiple runs in role-separated environments.

More restrictive security requirements do not permit elevated access.

There are several other options:

• Skip the checks that require root

access.

• Specify the root

user ID and password when prompted.

• Configure sudo

.

If you are using sudo

, then add an entry for the temporary directory,

$HOME

in the

/etc/sudoers

file that corresponds to the user who is running the health checks.

To determine what

$HOME

is set to, run the echo $HOME

command.

For example: user ALL=(root) NOPASSWD:/root/.orachk/root_orachk.sh

user ALL=(root) NOPASSWD:/root/.exachk/root_exachk.sh

• Pre-configure passwordless SSH connectivity.

2.3.2 Handling of Root Passwords

Handling of root

passwords depends on whether you have installed the Expect utility.

Expect automates interactive applications such as Telnet, FTP, passwd, fsck, rlogin, tip, and so on.

To handle root passwords:

1.

2.

If you have installed the Expect utility, then specify the root

password when you run the health checks for the first time.

The Expect utility stores the password and uses the stored password for subsequent sessions.

The Expect utility prompts you to check if the root

password is same for all the remote components such as databases, switches, and so on.

Specify the password only once if you have configured the same root

password for all the components.

2-7

Chapter 2


3.

If root

password is not same for all the components, then the Expect utility prompts you to validate root

password every time you run the health checks.

If you enter the password incorrectly or the password is changed between the time it is entered and used, then Oracle ORAchk and Oracle EXAchk:

• Notify you

• Skip relevant checks

Run the health checks after resolving the issues.

If Oracle ORAchk and Oracle EXAchk skip any of the health checks, then the tools log details about the skipped checks in the report output.

Related Topics:

• Expect - Expect - Home Page

2.3.3 Configuring Email Notification System

Oracle Health Check Collections Manager provides an email notification system that users can subscribe to.

The setup involves:

• Configuring the email server, port, and the frequency of email notifications.

• Registering the email address

Note:

Only the users who are assigned Admin role can manage Email Notification

Server and Job details.

To configure the email notification system:

1.

Log in to Oracle Health Check Collections Manager, and then click

Administration at the upper-right corner.

Figure 2-1 Oracle Health Check Collections Manager - Administration

2.

Under Administration, click Manage Email Server & Job Details.

2-8

Chapter 2


Figure 2-2 Oracle Health Check Collections Manager - Configure Email

Server a.

b.

Specify a valid Email Server Name, Port Number, and then click Set My

Email Server Settings.

Set Email Notification Frequency as per your needs.

See the Notification Job Run Details on the same page.

Figure 2-3 Oracle Health Check Collections Manager - Notification Job

Run status details

3.

Go back to the Administration page, and click Manage Notifications.

2-9

Chapter 2


Figure 2-4 Oracle Health Check Collections Manager - Manage

Notifications a.

b.

c.

d.

e.

If you are configuring for the first time, then enter your email address.

Subsequent access to Manage Notifications page shows your email address automatically.

By default, Subscribe/Unsubscribe My Mail Notifications is checked. Leave as is.

Under Collection Notifications, choose the type of collections for which you want to receive notifications.

Select to receive notification when the available space in ORAchk CM

Tablespace falls below 100 MB.

Validate the notification delivery by clicking Test under Test your email

settings.

If the configuration is correct, then you must receive an email. If you do not receive an email, then check with your administrator.

Following is the sample notification:

From: [email protected]

Sent: Thursday, January 28, 2016 12:21 PM

To: [email protected]

Subject: Test Mail From Collection Manager

f.

Testing Collection Manager Email Notification System

Click Submit.

2-10

Chapter 2


Note:

Manage Notifications section under the Administration menu is available for all users irrespective of the role.

If the ACL system is enabled, then the registered users receive notifications for the systems that they have access to. If the ACL system is not configured, then all the registered users receive all notifications.

Depending on the selections, you made under Collection Notifications section, you receive an email with

Subject: Collection Manager Notifications containing application URL with results.

Figure 2-5 Oracle Health Check Collections Manager - Sample Email

Notification

Under Comments column, click the Click here links for details. Click the respective

URLs, authenticate, and then view respective comparison report.

Figure 2-6 Oracle Health Check Collections Manager - Sample Diff Report

2-11

Chapter 2

Using Oracle ORAchk and Oracle EXAchk to Manually Generate Health Check Reports

2.4 Using Oracle ORAchk and Oracle EXAchk to Manually

Generate Health Check Reports

This section explains the procedures to manually generate health check reports.

•


Usually, health checks run at scheduled intervals. However, Oracle recommends that you run health checks on-demand when needed.

•

Running Health Checks in Silent Mode

Run health checks automatically by scheduling them with the Automated Daemon

Mode operation.

•


When running on-demand, if the daemon is running all prompts, then the daemon answers where possible including the passwords.

•



•


Optionally email the HTML report to one or more recipients using the

–sendemail option.

2.4.1 Running Health Checks On-Demand


Examples of when you must run health checks on-demand:

• Pre- or post-upgrades

• Machine relocations from one subnet to another

• Hardware failure or repair

• Problem troubleshooting

• In addition to go-live testing

To start on-demand health check runs, log in to the system as an appropriate user, and then run an appropriate tool. Specify the options to direct the type of run that you want.

$ ./orachk

$ ./exachk

2-12

Chapter 2


Note:

To avoid problems while running the tool from terminal sessions on a network attached workstation or laptop, consider running the tool using VNC. If there is a network interruption, then the tool continues to process to completion. If the tool fails to run, then re-run the tool. The tool does not resume from the point of failure.

Output varies depending on your environment and options used:

• The tool starts discovering your environment

• If you have configured passwordless SSH equivalency, then the tool does not prompt you for passwords

• If you have not configured passwordless SSH for a particular component at the required access level, then the tool prompts you for password

• If the daemon is running, then the commands are sent to the daemon process that answers all prompts, such as selecting the database and providing passwords

• If the daemon is not running, then the tool prompts you for required information, such as which database you want to run against, the required passwords, and so on

• The tool investigates the status of the discovered components

Note:

If you are prompted for passwords, then the Expect utility runs when available.

In this way, the passwords are gathered at the beginning, and the Expect utility supplies the passwords when needed at the root password prompts. The

Expect utility being supplying the passwords enables the tool to continue without the need for further input. If you do not use the Expect utility, then closely monitor the run and enter the passwords interactively as prompted.

Without the Expect utility installed, you must enter passwords many times depending on the size of your environment. Therefore, Oracle recommends that you use the Expect utility.

While running pre- or post-upgrade checks, Oracle ORAchk and Oracle EXAchk automatically detect databases that are registered with Oracle Clusterware and presents the list of databases to check.

Run the pre-upgrade checks during the upgrade planning phase. Oracle ORAchk and Oracle EXAchk prompt you for the version to which you are planning to upgrade:

$ ./orachk –u –o pre

$ ./exachk –u –o pre

After upgrading, run the post-upgrade checks:

$ ./orachk –u –o post

2-13

Chapter 2


$ ./exachk –u –o post

• The tool starts collecting information across all the relevant components, including the remote nodes.

• The tool runs the health checks against the collected data and displays the results.

• After completing the health check run, the tool points to the location of the detailed

HTML report and the

.zip

file that contains more output.

Related Topics:

•



•




• Expect - Expect - Home Page

2.4.2 Running Health Checks in Silent Mode

Run health checks automatically by scheduling them with the Automated Daemon

Mode operation.

Note:

Silent mode operation is maintained for backwards compatibility for the customers who were using it before the daemon mode was available. Silent mode is limited in the checks it runs and Oracle does not actively enhance it any further.

• Running health checks in silent mode using the

-s

option does not run any checks on the storage servers and switches.

• Running health checks in silent mode using the

-S

option excludes checks on database server that require root

access. Also, does not run any checks on the storage servers and database servers.

To run health checks silently, configure passwordless SSH equivalency. It is not required to run remote checks, such as running against a single-instance database.

When health checks are run silently, output is similar to that described in On-Demand

Mode Operation.

Note:

If not configured to run in silent mode operation on an Oracle Engineered

System, then the tool does not perform storage server or InfiniBand switch checks.

2-14

Chapter 2


Including Health Checks that Require root Access

Run as root

or configure sudo

access to run health checks in silent mode and include checks that require root

access.

To run health checks including checks that require root

access, use the

–s

option followed by other required options:

$ ./orachk –s

$ ./exachk –s

Excluding Health Checks that Require root Access

To run health checks excluding checks that require root

access, use the

–S

option followed by other required options:

$ ./orachk –S

$ ./exachk –S

Related Topics:

•

Using Oracle ORAchk and Oracle EXAchk to Automatically Check for Risks and

System Health


•



2.4.3 Running On-Demand With or Without the Daemon


To run health checks on-demand with or without the daemon:

1.

To run health checks on-demand if the daemon is running, use the

–daemon

option.

$ ./orachk –daemon

2.

$ ./exachk –daemon

To avoid connecting to the daemon process, meaning the tool to interactively prompt you as required, use the

–nodaemon

option.

$ ./orachk –nodaemon

$ ./exachk –nodaemon

Note:

If you are running database pre-upgrade checks (

-u –o pre

) and if the daemon is running, then you must use the

–nodaemon

option.

2-15

Chapter 2


2.4.4 Generating a Diff Report


To identify the changes since the last run:

1.

Run the following command:

$ ./orachk –diff report_1 report_2

Review the diff report to see a baseline comparison of the two reports and then a list of differences.

2.4.5 Sending Results by Email



To send health check run results by email:

1.

Specify the recipients in the

NOTIFICATION_EMAIL


$ ./orachk –sendemail “NOTIFICATION_EMAIL=email_recipients"

$ ./exachk –sendemail “NOTIFICATION_EMAIL=email_recipients"

2.

Where email_recipients is a comma-delimited list of email addresses.

Verify the email configuration settings using the

–testemail

option.

2.5 Managing the Oracle ORAchk and Oracle EXAchk

Daemons

This section explains the procedures to manage Oracle ORAchk and Oracle EXAchk daemons.

•



•



•



•



2-16

Chapter 2


2.5.1 Starting and Stopping the Daemon


To start and stop the daemon:

1.

To start the daemon, use the

–d start

option as follows:

$ ./orachk –d start

$ ./exachk –d start

2.

The tools prompt you to provide required information during startup.

To stop the daemon, use the

–d stop

option as follows:

$ ./orachk –d stop

$ ./exachk –d stop

3.

If a health check run is progress when you run the stop command, then the daemon indicates so and continues running.

To force the daemon to stop a health check run, use the

–d stop_client

option:

$ ./orachk –d stop_client

$ ./exachk –d stop_client

The daemon stops the health check run and then confirms when it is done. If necessary, stop the daemon using the

–d stop

option.

2.5.2 Configuring the Daemon for Automatic Restart


However, if you use the automatic restart option, the daemon restarts automatically after the server or node reboot.

Restarting the daemon automatically requires passwordless SSH user equivalence to root

for the user who is configuring the auto-start feature, for example, root

or oracle

.

If passwordless SSH user equivalence is not in place, then Oracle ORAchk and Oracle

EXAchk optionally configure for you.

The passwordless SSH user equivalence is retained as long as the daemon automatic restart functionality is configured.

Deconfiguring the daemon automatic restart feature restores the SSH configuration to the state it was found before automatic restart was configured.

To configure the daemon to start automatically:

1.

To set up daemon automatic restart, use

–initsetup

:

$ ./orachk –initsetup

$ ./exachk –initsetup

The tool prompts you to provide the required information during startup.

2-17

Chapter 2


Note:

Stop the daemon before running

–initsetup

, if the daemon is already running.

Pre-configure root

user equivalence for all

COMPUTE

,

STORAGE

, or

IBSWITCHES

using the

–initpresetup

option ( root

equivalency for

COMPUTE

nodes is mandatory for setting up the automatic restart functionality):

$ ./orachk –initpresetup

2.

$ ./exachk –initpresetup

To query automatic restart status of the daemon, use

–initcheck

:

$ ./orachk –initcheck

3.

$ ./exachk –initcheck

To remove automatic restart configuration, use

–initrmsetup

:

$ ./orachk –initrmsetup

$ ./exachk –initrmsetup

2.5.3 Setting and Getting Options for the Daemon


To set the daemon options:

1.

Set the daemon options using the

–set

option.

Set an option as follows:

$ ./orachk –set "option_1=option_1_value"

$ ./exachk –set "option_1=option_1_value"

Set multiple options using the

name=value

format separated by semicolons as follows:

$ ./orachk –set

"option_1=option_1_value;option_2=option_2_value;option_n=option_n_value"

$ ./exachk –set

"option_1=option_1_value;option_2=option_2_value;option_n=option_n_value"

•

AUTORUN_SCHEDULE

Schedule recurring health check runs using the

AUTORUN_SCHEDULE

daemon option.

•

AUTORUN_FLAGS

The

AUTORUN_FLAGS

daemon option determines how health checks are run.

•

NOTIFICATION_EMAIL

Set the

NOTIFICATION_EMAIL

daemon option to send email notifications to the recipients you specify.

2-18

Chapter 2


•

collection_retention

Set the collection_retention

daemon option to purge health check collection results that are older than a specified number of days.

•


The


daemon option defines the frequency, in hours, for the daemon to validate the passwords entered when the daemon was started the first time.

•

AUTORUN_INTERVAL

The

AUTORUN_INTERVAL

daemon option provides an alternative method of regularly running health checks.

•

Setting Multiple Option Profiles for the Daemon

Use only one daemon process for each server. Do not start a single daemon on multiple databases in a cluster, or multiple daemons on the same database.

•

Getting Existing Options for the Daemon

Query the values that you set for the daemon options.

2.5.3.1 AUTORUN_SCHEDULE

Schedule recurring health check runs using the

AUTORUN_SCHEDULE

daemon option.

To schedule recurring health check runs:

1.

Set the

AUTORUN_SCHEDULE

option, as follows:

AUTORUN_SCHEDULE=hour day month day_of_week where:

•

hour

is 0–23

•

day

is 1–31

•

month

is 1–12

•

day_of_week

is 0–6, where 0=Sunday and 6=Saturday

Use the asterisk (*) as a wildcard to specify multiple values separated by commas.

Table 2-1 AUTORUN_SCHEDULE

Example

"AUTORUN_SCHEDULE=* *

* *"

Result

Runs every hour.

"AUTORUN_SCHEDULE=3 *

* 0"

Runs at 3 AM every Sunday.

"AUTORUN_SCHEDULE=2 *

* 1, 3, 5"

Runs at 2 AM on Monday, Wednesday, and Friday.

"AUTORUN_SCHEDULE=4 1

* *"

Runs at 4 AM on the first day of every month.

"AUTORUN_SCHEDULE=8,2

0 * * 1, 2, 3, 4, 5"

Runs at 8 AM and 8 PM every Monday, Tuesday, Wednesday,

Thursday, and Friday.

For example:

$ ./orachk –set "AUTORUN_SCHEDULE=3 * * 0"

2-19

Chapter 2


$ ./exachk –set "AUTORUN_SCHEDULE=3 * * 0"

2.5.3.2 AUTORUN_FLAGS

The

AUTORUN_FLAGS

daemon option determines how health checks are run.

To configure how health checks should run:

1.

Set the

AUTORUN_FLAGS

option as follows:

AUTORUN_FLAGS=flags where,

flags

can be any combination of valid command-line flags.

Table 2-2 AUTORUN_FLAGS

Example Result

Runs only the dba

profile checks.

"AUTORUN_FLAGS=profile dba"

"AUTORUN_FLAGS=profile sysadmin –tag syadmin"

Runs only the dba

profile checks and tags the output with the value sysadmin

.

-excludeprofile ebs

Runs all checks except the checks in the ebs

profile.

For example:

$ ./orachk –set "AUTORUN_FLAGS=-profile sysadmin –tag sysadmin"

$ ./exachk –set "AUTORUN_FLAGS=-profile sysadmin –tag sysadmin"

2.5.3.3 NOTIFICATION_EMAIL

Set the

NOTIFICATION_EMAIL

daemon option to send email notifications to the recipients you specify.

To configure email notifications:

The daemon notifies the recipients each time a health check run completes or when the daemon experiences a problem.

1.

Specify a comma-delimited list of email addresses, as follows:

$ ./orachk –set

"[email protected],[email protected]"

2.

$ ./exachk –set

"[email protected],[email protected]"

Test the email notification configuration using the

–testemail

option, as follows:

$ ./orachk -testemail all

$ ./exachk -testemail all

After the first health check run, the daemon notifies the recipients with report output attached.

2-20

Chapter 2


For the subsequent health check runs after the first email notification, the daemon emails the summary of differences between the most recent runs to all recipients specified in the

NOTIFICATION_EMAIL

list.

2.5.3.4 collection_retention


daemon option to purge health check collection results that are older than a specified number of days.

Note:

Specify the collection_retention

option in lower case.

To configure collection retention period:

1.


option, as follows: collection_retention=number_of_days

2.

If you do not set this option, then the daemon does not purge the stale collection.


option to an appropriate number of days based on:

• Frequency of your scheduled collections

• Size of the collection results

• Available disk space

For example:

$ ./orachk –set "collection_retention=60"

$ ./exachk –set "collection_retention=60"

2.5.3.5 PASSWORD_CHECK_INTERVAL

The


daemon option defines the frequency, in hours, for the daemon to validate the passwords entered when the daemon was started the first time.

If an invalid password is found due to a password change, then the daemon stops, makes an entry in the daemon log, and then sends an email notification message to the recipients specified in the

NOTIFICATION_EMAIL

option.

To configure password validation frequency:

1.

Set the


option, as follows:

PASSWORD_CHECK_INTERVAL=number_of_hours

If you do not set the


option, then the daemon cannot actively check password validity and fails the next time the daemon tries to run after a password change. Using the


option enables you to take corrective action and restart the daemon with the correct password rather than having failed collections.

2-21

Chapter 2


2.

Set the


option to an appropriate number of hours based on:

• Frequency of your scheduled collections

• Password change policies

For example:

$ ./orachk –set "PASSWORD_CHECK_INTERVAL=1"

$ ./exachk –set "PASSWORD_CHECK_INTERVAL=1"

2.5.3.6 AUTORUN_INTERVAL

The

AUTORUN_INTERVAL

daemon option provides an alternative method of regularly running health checks.

Note:

The

AUTORUN_SCHEDULE

option supersedes the

AUTORUN_INTERVAL

option. The

AUTORUN_INTERVAL

option is retained for backwards compatibility. Oracle recommends that you use the

AUTORUN_SCHEDULE

option.

To configure recurring health check runs:

1.

Set the

AUTORUN_INTERVAL

option, as follows:

AUTORUN_INTERVAL=n [d | h] where:

•

n

is a number

•

d

is days

•

h

is hours

Table 2-3 AUTORUN_INTERVAL

Example Result

"AUTORUN_INTERVAL=1h"

Runs every hour.

"AUTORUN_INTERVAL=12h

"

Runs every 12 hours.

"AUTORUN_INTERVAL=1d"

Runs every day.

"AUTORUN_INTERVAL=7d"

Runs every week.

2.5.3.7 Setting Multiple Option Profiles for the Daemon

Use only one daemon process for each server. Do not start a single daemon on multiple databases in a cluster, or multiple daemons on the same database.

The daemon does not start, if the daemon detects another Oracle ORAchk or Oracle

EXAchk daemon process running locally.

2-22

Chapter 2


Define multiple different run profiles using the same daemon. Defining multiple different run profiles enables you to run multiple different health checks with different daemon options, such as different schedules, email notifications, and automatic run flags. The daemon manages all profiles.

To set multiple option profiles for the daemon:

1.

Define daemon option profiles using the

–id id

option before the

–set

option.

Where,

id

is the name of the profile

$ ./orachk –id id –set "option=value"

$ ./exachk –id id –set "option=value"

For example, if the database administrator wants to run checks within the dba

profile and the system administrator wants to run checks in the sysadmin

profile, then configure the daemon using the profiles option.

Define the database administrator profile as follows:

$ ./orachk –id dba –set "[email protected];\

AUTORUN_SCHEDULE=4,8,12,16,20 * * *;AUTORUN_FLAGS=-profile dba –tag dba;\

collection_retention=30"

Created notification_email for ID[dba]

Created autorun_schedule for ID[dba]

Created autorun_flags for ID[dba]

Created collection_retention for ID[dba]

$ ./exachk –id dba –set "[email protected];\

AUTORUN_SCHEDULE=4,8,12,16,20 * * *; AUTORUN_FLAGS=-profile dba –tag dba;\


Created notification_email for ID[dba]

Created autorun_schedule for ID[dba]

Created autorun_flags for ID[dba]

Created collection_retention for ID[dba]

Define the system administrator profile as follows:

$ ./orachk –id sysadmin –set "[email protected];\

AUTORUN_SCHEDULE=3 * * 1,3,5; AUTORUN_FLAGS=-profile sysadmin –tag sysadmin;\


Created notification_email for ID[sysadmin]

Created autorun_schedule for ID[sysadmin]

Created autorun_flags for ID[sysadmin]

Created collection_retention for ID[sysadmin]

$ ./exachk –id sysadmin –set "[email protected];\

AUTORUN_SCHEDULE=3 * * 1,3,5; AUTORUN_FLAGS=-profile sysadmin –tag sysadmin;\


Created notification_email for ID[sysadmin]

Created autorun_schedule for ID[sysadmin]

Created autorun_flags for ID[sysadmin]

Created collection_retention for ID[sysadmin]

2-23

Chapter 2


2.5.3.8 Getting Existing Options for the Daemon

Query the values that you set for the daemon options.

To query the values, use

[-id ID] -get option | all where:

•

ID

is a daemon option profile

•

option

is a specific daemon option you want to retrieve

•

all

returns values of all options

To get existing options for the daemon:

1.

To get a specific daemon option:

For example:

$ ./orachk –get NOTIFICATION_EMAIL

ID: orachk.default

-----------------------------------------notification_email = [email protected]

$ ./exachk –get NOTIFICATION_EMAIL

2.

ID: exachk.default


To query multiple daemon option profiles:

For example:

$ ./orachk –get NOTIFICATION_EMAIL

ID: orachk.default


ID: dba


ID: sysadmin


$ ./exachk –get NOTIFICATION_EMAIL

ID: exachk.default


ID: dba


2-24

Chapter 2


3.

ID: sysadmin


To limit the request to a specific daemon option profile, use the

–id ID -get option option:

For example:

To get the

NOTIFICATION_EMAIL

for a daemon profile called dba

:

$ ./orachk –id dba –get NOTIFICATION_EMAIL

ID: dba


$ ./exachk –id dba –get NOTIFICATION_EMAIL

4.

ID: dba


To get all options set, use the

–get all

option:

For example:

$ ./orachk –get all

ID: orachk.default


autorun_schedule = 3 * * 0 collection_retention = 30 password_check_interval = 1

$ ./exachk –get all

5.

ID: exachk.default



To query all daemon option profiles:

For example:

$ ./orachk –get all

ID: orachk.default



ID: dba


autorun_schedule = 4,8,12,16,20 * * *

2-25

Chapter 2

Managing the Oracle ORAchk and Oracle EXAchk Daemons autorun_flags = -profile dba – tag dba collection_retention = 30 password_check_interval = 1

ID: sysadmin


autorun_schedule = 3 * * 1,3,5 autorun_flags = -profile sysadmin –tag sysadmin collection_retension = 60 password_check_interval = 1

$ ./exachk –get all

ID: exachk.default



ID: dba


autorun_schedule = 4,8,12,16,20 * * * autorun_flags = -profile dba – tag dba collection_retention = 30 password_check_interval = 1

6.

ID: sysadmin


autorun_schedule = 3 * * 1,3,5 autorun_flags = -profile sysadmin –tag sysadmin collection_retension = 60 password_check_interval = 1

To get all the options set for a daemon profile, for example, a daemon profile called dba

:

$ ./orachk –id dba –get all

ID: dba



$ ./exachk –id dba –get all

ID: dba



2-26

Chapter 2


2.5.4 Querying the Status and Next Planned Daemon Run


-d status|info|nextautorun

•

-d status

: Checks if the daemon is running.

•

-d info

: Displays information about the running daemon.

•

-d nextautorun [-id ID]

: Displays the next automatic run time.

To query the status and next planned daemon run:

1.

To check if the daemon is running, use

–d status

:

$ ./orachk –d status

$ ./exachk –d status

2.

If the daemon is running, then the daemon confirms and displays the PID.

To query more detailed information about the daemon, use

–d info

:

$ ./orachk –d info

$ ./exachk –d info

3.

The daemon responds with the following information:

• Node on which the daemon is installed

• Version

• Install location

• Time when the daemon was started

To query the next scheduled health check run, use

–d nextautorun

:

$ ./orachk –d nextautorun

$ ./exachk –d nextautorun

The daemon responds with details of schedule.

If you have configured multiple daemon option profiles, then the output shows whichever is scheduled to run next.

If you have configured multiple daemon option profiles, then query the next scheduled health check run of a specific profile using

–id ID -d nextautorun

:

$ ./orachk –d ID –d nextautorun

$ ./exachk –d ID –d nextautorun

The daemon responds with details of the schedule for the daemon options profile

ID you have specified.

2.6 Tracking Support Incidents

The Incidents tab gives you a complete system for tracking support incidents.

2-27

Chapter 2


• Specify contact details of each customer, products and categories, and then set up values to limit status codes, severity, and urgency attributes for an incident

• Raise a new ticket by clicking the Delta (Δ) symbol. Oracle Health Check

Collections Manager displays the delta symbol only in the Collections and

Browse tabs

• The Browse tab enables you to create a new ticket on individual checks

• The Collections tab enables you to create a single ticket for entire the collection

• Delta (Δ) symbol is color coded red, blue, and green based on the ticket status

– RED (No Incident ticket exists): Initiates the process to create a new incident ticket for the collection or individual checks

– BLUE (An open Incident ticket exists): Opens the incident ticket for editing

– GREEN (A closed Incident ticket exists): Opens the closed incident ticket for viewing

• Track the progress of the ticket in an update area of the ticket, or add attachments and links to the incident

• Use tags to classify incidents and use the resulting tag cloud in your reports

• Incident access and management happen only within your access control range

Note:

Incident Tracking feature is a basic stand-alone system and it is not designed for integration with other commercial enterprise-level trouble ticketing systems.

Figure 2-7 Incidents Tab

Incident Tracking Features

• Search options

• Track and analyze incident tickets

• Flexible and updateable incident status

2-28

Chapter 2


• Robust reporting

• Link, Note, and File Attachments

• Flexible Access Control (reader, contributor, administrator model)

Related Topics:

•


Create or edit incident tickets for individual checks or for an entire collection.

•

Oracle ORAchk and EXAchk User’s Guide

2.7 Tracking File Attribute Changes and Comparing

Snapshots


-fileattr


Changes to the attributes of files such as owner, group, or permissions can cause unexpected consequences. Proactively monitor and mitigate the issues before your business gets impacted.

•

Using the File Attribute Check With the Daemon

You must have Oracle Grid Infrastructure installed and running before you use

fileattr

.

•

Taking File Attribute Snapshots

By default, Oracle Grid Infrastructure homes and all the installed Oracle Database homes are included in the snapshots.

•

Including Directories to Check

Include directories in the file attribute changes check.

•

Excluding Directories from Checks

Exclude directories from file attribute changes checks.

•

Rechecking Changes

Compare the new snapshot with the previous one to track changes.

•

Designating a Snapshot As a Baseline

Designate a snapshot as a baseline to compare with other snapshots.

•

Restricting System Checks

Restrict Oracle ORAchk and Oracle EXAchk to perform only file attribute changes checks.

•

Removing Snapshots

Remove the snapshots diligently.

2.7.1 Using the File Attribute Check With the Daemon

You must have Oracle Grid Infrastructure installed and running before you use

fileattr

.

To use file attribute check with the daemon:

1.

Start the daemon.

./orachk -d start

2-29

Chapter 2


2.

3.

Start the client run with the

-fileattr

options.

./orachk -fileattr start -includedir "/root/myapp,/etc/oratab" -excludediscovery

./orachk -fileattr check -includedir "/root/myapp,/etc/oratab" -excludediscovery

Specify the output directory to store snapshots with the

–output

option.

4.

./orachk -fileattr start -output "/tmp/mysnapshots"

Specify a descriptive name for the snapshot with the

–tag

option to identify your snapshots.

For example:

./orachk -fileattr start -tag "BeforeXYZChange"

Generated snapshot directory-

orachk_myserver65_20160329_052056_ BeforeXYZChange

2.7.2 Taking File Attribute Snapshots

By default, Oracle Grid Infrastructure homes and all the installed Oracle Database homes are included in the snapshots.

To take file attribute snapshots:

1.

To start the first snapshot, run the

–fileattr start

command.

./orachk –fileattr start

./exachk –fileattr start

$ ./orachk -fileattr start

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/11.2.0.4/grid?[y/n][y]

Checking ssh user equivalency settings on all nodes in cluster

Node mysrv22 is configured for ssh user equivalency for oradb user

Node mysrv23 is configured for ssh user equivalency for oradb user

List of directories(recursive) for checking file attributes:

/u01/app/oradb/product/11.2.0/dbhome_11202


/u01/app/oradb/product/11.2.0/dbhome_11204 orachk has taken snapshot of file attributes for above directories at: /orahome/ oradb/orachk/orachk_mysrv21_20160504_041214

2.7.3 Including Directories to Check

Include directories in the file attribute changes check.

To include directories to check:

1.

Run the file attribute changes check command with the

–includedir directories option.

Where,

directories

is a comma-delimited list of directories to include in the check.

For example:

./orachk -fileattr start -includedir "/home/oradb,/etc/oratab"

./exachk -fileattr start -includedir "/home/oradb,/etc/oratab"

2-30

Chapter 2


$ ./orachk -fileattr start -includedir "/root/myapp/config/"

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/12.2.0/grid?[y/n][y]

Checking for prompts on myserver18 for oragrid user...


Node myserver17 is configured for ssh user equivalency for root user


/u01/app/12.2.0/grid


/u01/app/oradb2/product/12.2.0/dbhome_1

/root/myapp/config/ orachk has taken snapshot of file attributes for above directories at: /root/orachk/ orachk_ myserver18_20160511_032034

2.7.4 Excluding Directories from Checks

Exclude directories from file attribute changes checks.

To exclude directories from checks:

1.

Run the file attribute changes check command to exclude directories that you do not list in the

-includedir

discover list by using the

-excludediscovery

option.

For example:

$ ./orachk -fileattr start -includedir "/root/myapp/config/" -excludediscovery






/root/myapp/config/ orachk has taken snapshot of file attributes for above directories at: /root/orachk/ orachk_myserver18_20160511_032209

2.7.5 Rechecking Changes

Compare the new snapshot with the previous one to track changes.

To recheck changes:

1.

Run the file attribute changes check command with the check

option to take a new snapshot, and run a normal health check collection.

The

–fileattr check

command compares the new snapshot with the previous snapshot.

For example:

./orachk –fileattr check

./exachk –fileattr check

2-31

Chapter 2


Note:

To obtain an accurate comparison between the snapshots, you must use

– fileattr check

with the same options that you used with the previous snapshot collection that you obtained with

–fileattr start

.

For example, if you obtained your first snapshot by using the options

includedir "/somedir" –excludediscovery

when you ran

–fileattr start

, then you must include the same options with

–fileattr check

to obtain an accurate comparison.

$ ./orachk -fileattr check -includedir "/root/myapp/config" -excludediscovery






/root/myapp/config

Checking file attribute changes...

.

"/root/myapp/config/myappconfig.xml" is different:

Baseline : 0644 oracle root /root/myapp/config/myappconfig.xml

Current : 0644 root root /root/myapp/config/myappconfig.xml

...

Results of the file attribute changes are reflected in the File Attribute Changes section of the HTML output report.

2.7.6 Designating a Snapshot As a Baseline

Designate a snapshot as a baseline to compare with other snapshots.

To designate a snapshot as a baseline:

1.


–baseline

path_to_snapshot

option.

The

–baseline path_to_snapshot

command compares a specific baseline snapshot with other snapshots, if you have multiple different baselines to check.

./orachk -fileattr check -baseline path_to_snapshot

./exachk –fileattr check –baseline path_to_snapshot

For example:

./orachk -fileattr check -baseline "/tmp/Snapshot"

2.7.7 Restricting System Checks

Restrict Oracle ORAchk and Oracle EXAchk to perform only file attribute changes checks.

By default,

–fileattr check

also performs a full health check run.

2-32

Chapter 2


To restrict system checks:

1.


–fileattronly

option.

./orachk -fileattr check –fileattronly

./exachk -fileattr check –fileattronly

2.7.8 Removing Snapshots

Remove the snapshots diligently.

To remove snapshots:

1.

Run the file attribute changes check command with the remove

option:

./orachk –fileattr remove

./exachk –fileattr remove

For example:

$ ./orachk -fileattr remove

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /u01/app/12.2.0/grid?[y/n][y]y





/u01/app/12.2.0/grid


/u01/app/oradb2/product/12.2.0/dbhome_1

Removing file attribute related files...

...

2.8 Collecting and Consuming Health Check Data

Oracle Health Check Collections Manager for Oracle Application Express 4.2 provides you an enterprise-wide view of your health check collection data.

•



•



•


Modify or disable the purge schedule for Oracle Health Check Collections

Manager collection data.

•



2-33

Chapter 2


•



•

Authoring User-Defined Checks

Define, test, and maintain your own checks that are specific to your environment.

•



.

•



•


On Linux systems, view a summary of the VMPScan report in the Clusterwide

Linux Operating System Health Check (VMPScan) section of the Health Check report.

Related Topics:

•


2.8.1 Selectively Capturing Users During Login


Automatically capturing users during login automates user management. You need not create users manually.

By default, Oracle Health Check Collections Manager:

• Captures details of users that are logging in with LDAP authentication

• Assigns them Oracle Health Check Collections Manager roles, for example, DBA role.

Note:

The Oracle Health Check Collections Manager roles are specific to Oracle

Health Check Collections Manager and do not equate to system privileges.

For example, the DBA role is not granted SYSDBA system privilege.

However, you can disable automatic capture and re-enable anytime later. If you disable, then you must manually create users and assign them roles.

To enable or disable capturing user details automatically:

1.

Click Administration, and then select Manage Users, User Roles and assign

System to users.

2-34

Chapter 2


Figure 2-8 Manage Users, User Roles and assign System to users

2.

To disable automatic capture of users details, click Don’t Capture User Details

(When Login).

Figure 2-9 Don’t Capture User Details (When Login)

3.

To re-enable automatic capture of user details, click Capture User Details (When

Login).

Figure 2-10 Capture User Details (When Login)

2-35

Chapter 2


2.8.2 Bulk Mapping Systems to Business Units


To bulk map systems to the business units:

1.

Click Administration, then select Assign System to Business Unit.

Figure 2-11 Assign System to Business Unit

2.

Click Bulk Mapping.

Figure 2-12 Bulk Mapping

3.

Upload a mapping XML.

a.

Click Generate XML File (Current Mapping).

2-36

Chapter 2


b.

Download the resulting XML file that contains your current system to business unit mappings.

Figure 2-13 Upload a mapping XML c.

d.

Amend the XML to show mappings that you want.

Upload new Mapping XML through Upload Mapping (XML File).

2.8.3 Adjusting or Disabling Old Collections Purging

Modify or disable the purge schedule for Oracle Health Check Collections Manager collection data.

By default, Oracle Health Check Collections Manager purges collections older than three months.

To adjust or disable the collection purging frequency:

1.

Click Administration, and then select Manage Email Server & Job Details.

Figure 2-14 Manage Email Server and Job Details

2.

Select an appropriate option:

• Change the frequency of purges by setting different values in Purge

Frequency . Then click Click To Purge Every.

• To disable purging, click Click To Disable Purging.

2-37

Chapter 2


• To re-enable purging, click Click To Enable Purging.

Figure 2-15 Configure Purging

2.8.4 Uploading Collections Automatically


Specify the connection string and the password to connect to the database. Oracle

Health Check Collections Manager stores the connection details in an encrypted wallet.

To configure Oracle ORAchk and Oracle EXAchk to upload check results automatically:

1.

Specify the connection details using the

–setdbupload

option. For default options, use

–setdbupload all

.

orachk -setdbupload all exachk -setdbupload all

2.

Oracle Health Check Collections Manager prompts you to enter the values for the connection string and password. Oracle Health Check Collections Manager stores these values in an encrypted wallet file.

Verify the values set in the wallet, using the

–getdbupload

option.

orachk –getdbupload exachk –getdbupload

2-38

Chapter 2


3.

Oracle ORAchk and Oracle EXAchk automatically use the default values set in the

RAT_UPLOAD_USER

and

RAT_ZIP_UPLOAD_TABLE

environment variables.

Verify, using the

–checkdbupload

option if Oracle ORAchk and Oracle EXAchk successfully connect to the database.

orachk –checkdbupload

4.

exachk –checkdbupload

Set database uploads for Oracle ORAchk and Oracle EXAchk check results.

orachk -setdbupload all exachk -setdbupload all

Note:

Use fully qualified address for the connect string as mentioned in the previous example. Do not use an alias from the tnsnames.ora

file.

Using fully qualified address eliminates the need to rely on tnsnames.ora

file name resolution on all the servers where you run the tool.

5.

Review Oracle ORAchk and Oracle EXAchk database check result uploads.

orachk -getdbupload exachk -getdbupload

Example 2-1 Checking Oracle ORAchk and Oracle EXAchk Check Result

Uploads

$ ./orachk -checkdbupload

Configuration is good to upload result to database.

At the end of health check collection, Oracle ORAchk and Oracle EXAchk check if the required connection details are set (in the wallet or the environment variables). If the connection details are set properly, then Oracle ORAchk and Oracle EXAchk upload the collection results.

To configure many Oracle ORAchk and Oracle EXAchk instances:

1.

2.

Create the wallet once with the

-setdbupload all

option, then enter the values when prompted.

Copy the resulting wallet directory to each Oracle ORAchk and Oracle EXAchk instance directories.

You can also set the environment variable

RAT_WALLET_LOC

to point to the location of the wallet directory.

Other configurable upload values are:

•

RAT_UPLOAD_USER

: Controls which user to connect as (default is

ORACHKCM

).

•

RAT_UPLOAD_TABLE

: Controls the table name to store non-zipped collection results in

(not used by default).

2-39

Chapter 2


•

RAT_PATCH_UPLOAD_TABLE

: Controls the table name to store non-zipped patch results in (not used by default).

•

RAT_UPLOAD_ORACLE_HOME

: Controls

ORACLE_HOME

used while establishing connection and uploading.

By default, the

ORACLE_HOME

environment variable is set to the Oracle Grid

Infrastructure Grid home that Oracle ORAchk and Oracle EXAchk discover.

RCA13_DOCS

: Not configurable to use Oracle Health Check Collections Manager because

RCA13_DOCS

is the table Oracle Health Check Collections Manager looks for.

RAT_UPLOAD_TABLE

and


: Not used by default because the zipped collection details are stored in

RCA13_DOCS

.

Configure

RAT_UPLOAD_TABLE

and


environments variables if you are using your own custom application to view the collection results.

You can also set these values in the wallet.

For example:

$ ./orachk -setdbupload all

$ ./exachk -setdbupload all

This prompts you for and set the

RAT_UPLOAD_CONNECT_STRING

and

RAT_UPLOAD_PASSWORD

, then use

$ ./orachk -setdbupload RAT_PATCH_UPLOAD_TABLE,RAT_PATCH_UPLOAD_TABLE

$ ./exachk -setdbupload RAT_PATCH_UPLOAD_TABLE,RAT_PATCH_UPLOAD_TABLE

Note:

Alternatively, set all values set in the wallet using the environment variables. If you set the values using the environment variable

RAT_UPLOAD_CONNECT_STRING

, then enclose the values in double quotes.

For example: export RAT_UPLOAD_CONNECT_STRING="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)

(HOST=myserver44.example.com)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)

(SERVICE_NAME=orachkcm.example.com)))"

2.8.5 Viewing and Reattempting Failed Uploads


The tools store the values in the

collection_dir/outfiles/check_env.out

file to record if the previous database upload was successful or not.

The following example shows that database upload has been set up, but the last upload was unsuccessful:

DATABASE_UPLOAD_SETUP=1

DATABASE_UPLOAD_STATUS=0

2-40

Chapter 2


To view and reattempt failed uploads:

1.

To view failed collections, use the

-checkfaileduploads

option.

./orachk -checkfaileduploads

./exachk -checkfaileduploads

For example:

$ ./orachk -checkfaileduploads

2.

List of failed upload collections

/home/oracle/orachk_myserver_042016_232011.zip





To reattempt collection upload, use the

-uploadfailed

option

Specify either all to upload all collections or a comma-delimited list of collections:

./orachk -uploadfailed all|list of failed collections

./exachk -uploadfailed all|list of failed collections

For example:

./orachk -uploadfailed "


,

/ home/oracle/orachk_myserver_042016_231732.zip

"

Note:

You cannot upload collections uploaded earlier because of the SQL unique constraint.

2.8.6 Authoring User-Defined Checks

Define, test, and maintain your own checks that are specific to your environment.

Oracle supports the framework for creating and running user-defined checks, but not the logic of the checks. It is your responsibility to test, verify, author, maintain, and support user-defined checks. At runtime, Oracle ORAchk and Oracle EXAchk script run the user-defined checks and display the results in the User Defined Checks section of the HTML report.

The user-defined checks are stored in the Oracle Health Check Collections Manager schema and output to an XML file, which is co-located with the ORAchk script. When run on your system, ORAchk 12.1.0.2.5 and later tries to find the XML file. If found, then Oracle ORAchk runs the checks contained therein and includes the results in the standard HTML report.

To author user-defined checks:

1.

Click the User Defined Checks tab, then select Add New Check.

2-41

Figure 2-16 User-Defined Checks Tab

Chapter 2


2.

Select OS Check or SQL Check as Audit Check Type.

Operating system checks use a system command to determine the check status.

SQL checks run an SQL statement to determine the check status.

Figure 2-17 User-Defined Checks Tab - Audit Check Type

Once you have selected an Audit Check Type, Oracle Health Check Collections

Manager updates the applicable fields.

Any time during authoring, click the title of a field to see help documentation specific to that field.

Operating system and SQL commands are supported. Running user-defined checks as root

is NOT supported.

2-42

Chapter 2


Figure 2-18 User-Defined Checks Tab - Audit Check Type - OS Check

Once a check is created, the check is listed in the Available Audit Checks section.

Filter the checks using the filters on this page.

2-43

Chapter 2


Figure 2-19 User-Defined Checks Tab - Available Audit Checks

3.

4.

Click the Generate XML.

On the right, find a link to download the generated user_defined_checks.xml

file.

The generated XML file includes all the checks that have been authored and have not been placed on hold. Placing checks on hold is equivalent to a logical delete. If there is a problem with a check or the logic is not perfect, then place the check on hold. The check that is placed on hold is not included in the XML file. If the check is production ready, then remove the hold to include the check the next time the

XML file is generated.

Download and save the user_defined_checks.xml

file into the same directory as the Oracle ORAchk and Oracle EXAchk tools.

Oracle ORAchk and Oracle EXAchk run the user-defined checks the next time they run.

2-44

Chapter 2


Figure 2-20 User-Defined Checks Tab - Download User-Defined Checks

5.

Alternatively, to run only the user-defined checks use the profile user_defined_checks

.

When this option is used, then the user-defined checks are the only checks run and theUser Defined Checks section is the only one with results displayed in the report.

./orachk –profile user_defined_checks

6.

./exachk –profile user_defined_checks

To omit the user-defined checks at runtime, use the

–excludeprofile

option.

./orachk –excludeprofile user_defined_checks

./exachk –excludeprofile user_defined_checks

2.8.7 Finding Which Checks Require Privileged Users


.

Enable Javascript before you view the Health Check Catalogs.

1.

2.

3.

To filter health checks by privileged users:

4.

Go to My Oracle Support note 1268927.2.

Click the Health Check Catalog tab.

Click Open ORAchk Health Check Catalog to open or download the

ORAchk_Health_Check_Catalog.html

file.

Click the Privileged User drop-down list and then clear or select the check boxes appropriately.

2-45

Chapter 2


Figure 2-21 Oracle ORAchk - Privileged User

Related Topics:


2.8.8 Creating or Editing Incidents Tickets


Oracle Health Check Collections Manager represents the statuses of each ticket with different colored icons. To act upon the tickets, click the icons.

•

Creating Incident Tickets

•


2.8.8.1 Creating Incident Tickets

To create incident tickets:

1.

2.

3.

4.

Click the Delta (Δ) symbol colored RED.

Add your ticket details.

Click Next.

Select the Product and Product Version.

5.

6.

7.

Click Next.

Select the

Urgency

of the ticket.

Select the Severity of the ticket.

Select the Status of the ticket.

8.

9.

Select the Category of the ticket.

10.

Enter a summary and description of the incident.

11.

Click Create Ticket.

2-46

Chapter 2



3.

4.

1.

2.

5.

To edit incident tickets:

Click the Incident tab.

Click Open Tickets.

Click the ticket.

Click Edit Ticket.

Alter required details, click Apply Changes.

Note:

Click the delta symbol colored GREEN in the Collections or Browse tabs to edit incident tickets.

2.8.9 Viewing Clusterwide Linux Operating System Health Check

(VMPScan)

On Linux systems, view a summary of the VMPScan report in the Clusterwide Linux

Operating System Health Check (VMPScan) section of the Health Check report.

The full VMPScan report is also available within the

collection/reports

and

collection/outfiles/vmpscan

directory.

2-47

Chapter 2


Figure 2-22 Clusterwide Linux Operating System Health Check (VMPScan)

Note:

The VMPScan report is included only when Oracle ORAchk is run on Linux systems.

2.9 Locking and Unlocking Storage Server Cells


On the database server, if you configure passwordless SSH equivalency for the user that launched Oracle EXAchk to the root

user on each storage server, then Oracle

EXAchk uses SSH equivalency to complete the storage server checks. Run Oracle

EXAchk from the Oracle Exadata storage server, if there is no SSH connectivity from the database to the storage server.

To lock and unlock cells, use the

–unlockcells

and

–lockcells

options for Oracle

Exadata, Oracle SuperCluster and Zero Data Loss Recovery Appliance.

2-48

Chapter 2


./exachk -unlockcells all | -cells [comma-delimited list of cell names or cell IPs]

./exachk -lockcells all | -cells [comma-delimited list of cell names or cell IPs]

2.10 Integrating Health Check Results with Other Tools

Integrate health check results from Oracle ORAchk and Oracle EXAchk into Enterprise

Manager and other third-party tools.

•

Integrating Health Check Results with Oracle Enterprise Manager

Integrate health check results from Oracle ORAchk and Oracle EXAchk into

Oracle Enterprise Manager.

•

Integrating Health Check Results with Third-Party Tool

Integrate health check results from Oracle ORAchk and Oracle EXAchk into various third-party log monitoring and analytics tools, such as Elasticsearch and

Kibana.

•

Integrating Health Check Results with Custom Application

Oracle ORAchk and Oracle EXAchk upload collection results from multiple instances into a single database for easier consumption of check results across your enterprise.

2.10.2 Integrating Health Check Results with Third-Party Tool

Integrate health check results from Oracle ORAchk and Oracle EXAchk into various third-party log monitoring and analytics tools, such as Elasticsearch and Kibana.

Figure 2-23 Third-Party Tool Integration

Oracle ORAchk and Oracle EXAchk create JSON output results in the output upload directory, for example:

Report_Output_Dir/upload/mymachine_orachk_results.json

Report_Output_Dir/upload/mymachine_orachk_exceptions.json

Report_Output_Dir/upload/mymachine_exachk_results.json

Report_Output_Dir/upload/mymachine_exachk_exceptions.json

2-49

Chapter 2


1.

Run the

–syslog

option to write JSON results to the syslog

daemon.

For example:

./orachk –syslog

2.

./exachk –syslog

Verify the syslog

configuration by running the following commands:

Oracle ORAchk and Oracle EXAchk use the message levels:

CRIT

,

ERR

,

WARN

, and

INFO

.

3.

$ logger -p user.crit crit_message

$ logger -p user.err err_message

$ logger -p user.warn warn_message

$ logger -p user.info info_message

Verify in your configured message location, for example,

/var/adm/messages that each test message is written.

Related Topics:

• Elasticsearch: RESTful, Distributed Search & Analytics | Elastic

• Kibana: Explore, Visualize, Discover Data | Elastic

• Logging Alerts to the syslogd Daemon

2.10.3 Integrating Health Check Results with Custom Application

Oracle ORAchk and Oracle EXAchk upload collection results from multiple instances into a single database for easier consumption of check results across your enterprise.

Use Oracle Health Check Collections Manager or your own custom application to consume health check results.

1.

Upload the collection results into the following tables at the end of a collection:

Table 2-4 Uploading Collection Results into a Database

Table

rca13_docs auditcheck_result auditcheck_patch_resu lt

What Get’s Uploaded

Full zipped collection results.

Health check results.

Patch check results.

2.

If you install Oracle Health Check Collections Manager, then these tables are created by the install script.

If the tables are not created, then use the following DDL statements:

•

DDL for the RCA13_DOCS table

CREATE TABLE RCA13_DOCS (

DOC_ID NUMBER DEFAULT to_number(sys_guid(),'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') NOT NULL ENABLE,

COLLECTION_ID VARCHAR2(40 BYTE),

FILENAME VARCHAR2(1000 BYTE) NOT NULL ENABLE,

FILE_MIMETYPE VARCHAR2(512 BYTE),

FILE_CHARSET VARCHAR2(512 BYTE),

2-50

Chapter 2


FILE_BLOB BLOB NOT NULL ENABLE,

FILE_COMMENTS VARCHAR2(4000 BYTE),

TAGS VARCHAR2(4000 BYTE),

ATTR1 VARCHAR2(200 BYTE),

UPLOADED_BY VARCHAR2(200 BYTE) DEFAULT USER,

UPLOADED_ON TIMESTAMP (6) DEFAULT systimestamp,

SR_BUG_NUM VARCHAR2(20 BYTE),

CONSTRAINT RCA13_DOCS_PK PRIMARY KEY (DOC_ID),

CONSTRAINT RCA13_DOCS_UK1 UNIQUE (FILENAME)

);

•

DDL for the auditcheck_result table

CREATE TABLE auditcheck_result (

COLLECTION_DATE TIMESTAMP NOT NULL ENABLE,

CHECK_NAME VARCHAR2(256),

PARAM_NAME VARCHAR2(256),

STATUS VARCHAR2(256),

STATUS_MESSAGE VARCHAR2(256),

ACTUAL_VALUE VARCHAR2(256),

RECOMMENDED_VALUE VARCHAR2(256),

COMPARISON_OPERATOR VARCHAR2(256),

HOSTNAME VARCHAR2(256),

INSTANCE_NAME VARCHAR2(256),

CHECK_TYPE VARCHAR2(256),

DB_PLATFORM VARCHAR2(256),

OS_DISTRO VARCHAR2(256),

OS_KERNEL VARCHAR2(256),

OS_VERSION NUMBER,

DB_VERSION VARCHAR2(256),

CLUSTER_NAME VARCHAR2(256),

DB_NAME VARCHAR2(256),

ERROR_TEXT VARCHAR2(256),

CHECK_ID VARCHAR2(40),

NEEDS_RUNNING VARCHAR2(100),

MODULES VARCHAR2(4000),

DATABASE_ROLE VARCHAR2(100),

CLUSTERWARE_VERSION VARCHAR2(100),

GLOBAL_NAME VARCHAR2(256),

UPLOAD_COLLECTION_NAME VARCHAR2(256) NOT NULL ENABLE,

AUDITCHECK_RESULT_ID VARCHAR2(256) DEFAULT sys_guid() NOT NULL

ENABLE,

COLLECTION_ID VARCHAR2(40),

TARGET_TYPE VARCHAR2(128),

TARGET_VALUE VARCHAR2(256),

CONSTRAINT "AUDITCHECK_RESULT_PK" PRIMARY KEY ("AUDITCHECK_RESULT_ID")

);

•

DDL for the auditcheck_patch_result table

CREATE TABLE auditcheck_patch_result (

COLLECTION_DATE TIMESTAMP(6) NOT NULL,

HOSTNAME VARCHAR2(256),

ORACLE_HOME_TYPE VARCHAR2(256),

ORACLE_HOME_PATH VARCHAR2(256),

ORACLE_HOME_VERSION VARCHAR2(256),

PATCH_NUMBER NUMBER,

CLUSTER_NAME VARCHAR2(256),

DESCRIPTION VARCHAR2(256),

PATCH_TYPE VARCHAR2(128),

APPLIED NUMBER,

UPLOAD_COLLECTION_NAME VARCHAR2(256),

2-51

Chapter 2


RECOMMENDED NUMBER

);

Related Topics:

•



2.10.1 Integrating Health Check Results with Oracle Enterprise

Manager

Integrate health check results from Oracle ORAchk and Oracle EXAchk into Oracle

Enterprise Manager.

Oracle Enterprise Manager Cloud Control releases 13.1 and 13.2 support integration with Oracle ORAchk and Oracle EXAchk through the Oracle Enterprise Manager

ORAchk Healthchecks Plug-in. The Oracle Engineered System Healthchecks plug-in supported integration with EXAchk for Oracle Enterprise Manager Cloud Control 12c release 12.1.0.5 and earlier releases.

With Oracle Enterprise Manager Cloud Control 13.1, Oracle ORAchk and Oracle

EXAchk check results are integrated into the compliance framework. Integrating check results into the compliance framework enables you to display Compliance Framework

Dashboards and browse checks by compliance standards.

• Integrate check results into Oracle Enterprise Manager compliance framework.

• View health check results in native Oracle Enterprise Manager compliance dashboards.

Figure 2-24 Compliance Dashboard

• Related checks are grouped into compliance standards where you can view targets checked, violations, and average score.

2-52

Figure 2-25 Compliance Standards

Chapter 2


• From within a compliance standard, drill-down to see individual check results and break the results by targets.

Figure 2-26 Compliance Standards Drill-Down

Note:

Although Oracle ORAchk and Oracle EXAchk do not require additional licenses, you require applicable Oracle Enterprise Manager licenses.

Related Topics:

• Oracle Enterprise Manager ORAchk Healthchecks Plug-in User's Guide

• Oracle Enterprise Manager Licensing Information User Manual

2.11 Troubleshooting Oracle ORAchk and Oracle EXAchk

To troubleshoot and fix Oracle ORAchk and Oracle EXAchk issues, follow the steps explained in this section.

•

How to Troubleshoot Oracle ORAchk and Oracle EXAchk Issues

To troubleshoot Oracle ORAchk and Oracle EXAchk issues, follow the steps explained in this section.

•

How to Capture Debug Output

Follow these steps to capture debug information.

•

Remote Login Problems

If Oracle ORAChk and Oracle EXAchk tools have problem locating and running

SSH or SCP, then the tools cannot run any remote checks.

•

Permission Problems

You must have sufficient directory permissions to run Oracle ORAchk and Oracle

EXAchk.

2-53

Chapter 2


•

Slow Performance, Skipped Checks and Timeouts

Follow these steps to fix slow performance and other issues.

Related Topics:

•


2.11.1 How to Troubleshoot Oracle ORAchk and Oracle EXAchk

Issues

To troubleshoot Oracle ORAchk and Oracle EXAchk issues, follow the steps explained in this section.

To troubleshoot Oracle ORAchk and Oracle EXAchk:

1.

2.

Ensure that you are using the correct tool.

Use Oracle EXAchk for Oracle Engineered Systems except for Oracle Database

Appliance. For all other systems, use Oracle ORAchk.

Ensure that you are using the latest versions of Oracle ORAchk and Oracle

EXAchk.

a.

Check the version using the

–v

option.

$ ./orachk –v

3.

4.

5.

6.

7.

$ ./exachk –v

b.

Compare your version with the latest version available here:

• For Oracle ORAchk, refer to My Oracle Support Note 1268927.2.

• For Oracle EXAchk, refer to My Oracle Support Note 1070954.1.

Check the FAQ for similar problems in My Oracle Support Note 1070954.1.

Review the files within the log

directory.

• Check the applicable error.log

files for relevant errors.

The error.log

files contain stderr

output captured during the run.

–

output_dir/log/orachk _error.log

–

output_dir/log/exachk _error.log

• Check the applicable log for other relevant information.

–

output_dir/log/orachk.log

–

output_dir/log/exachk.log

Review My Oracle Support Notes for similar problems.

For Oracle ORAchk issues, check ORAchk (MOSC) in My Oracle Support

Community (MOSC).

If necessary, capture the debug output, and then log an SR and attach the resulting zip

file.

Related Topics:



2-54

Chapter 2


2.11.2 How to Capture Debug Output

Follow these steps to capture debug information.

To capture debug output:

1.

2.

Reproduce the problem with fewest runs before enabling debug.

Debug captures a lot and the resulting zip

file can be large so try to narrow down the amount of run necessary to reproduce the problem.

Use command-line options to limit the scope of checks.

Enable debug.

If you are running the tool in on-demand mode, then use the

–debug

option:

$ ./orachk –debug

$ ./exachk –debug

For example:

$ ./orachk -debug

+ PS4='$(date "+ $LINENO: + ")'

36276: + [[ -z 1 ]]

36302: + sed 's/[\.\/]//g'

36302: + basename /global/u01/app/oracle/arch03/ORACLE_CHECK/ORACLE_SR/orachk

36302: + echo orachk

36302: + program_name=orachk

36303: + which bash

36303: + echo 0

36303: + bash_found=0

36304: + SSH_PASS_STATUS=0

36307: + set +u

36309: + '[' 0 -ne 0 ']'

36315: + raccheck_deprecate_msg='RACcheck has been deprecated. ORAchk provides the same functionality.

Please switch to using ORAchk from same directory.\n\nRACcheck will not be available after this (12.1.0.2.3) release.

See MOS Note "RACcheck Configuration Audit Tool Statement of Direction - name change to ORAchk (Doc ID 1591208.1)".\n'

36316: + '[' orachk = raccheck ']'

36325: + export LC_ALL=C

36325: + LC_ALL=C

36326: + NO_WRITE_PASS=0

36327: + ECHO=:

36328: + DEBUG=:

36329: + AUDITTAB=db_audit

36379: + supported_modules='PREUPGR

. . . . . .

. . . . . .

When you enable debug, Oracle ORAchk and Oracle EXAchk create a new debug log file in:

•

output_dir/log/orachk _debug_date_stamp_time_stamp.log

•

output_dir/log/exachk _debug_date_stamp_time_stamp.log

The debug output file contains:

2-55

Chapter 2


• bash -x

of program on local node

• bash -x

of program on all remote nodes

• bash -x

of all dynamically generated and called scripts

– The

output_dir

directory retains various other temporary files used during health checks.

– If you run health checks using the daemon, then restart the daemon with the

–d start_debug

option.

Running the daemon with

–d start_debug

option generates both debug for daemon and includes debug in all client runs:

$ ./orachk –d start_debug

$ ./exachk –d start_debug

When debug is run with the daemon, Oracle ORAchk and Oracle EXAchk create a daemon debug log file in the directory in which the daemon was started: orachk_daemon_debug.log

3.

exachk_daemon_debug.log

Collect the resulting output zip

file and the daemon debug log file, if applicable.

2.11.3 Remote Login Problems

If Oracle ORAChk and Oracle EXAchk tools have problem locating and running SSH or SCP, then the tools cannot run any remote checks.

Also, the root

privileged commands do not work if:

• Passwordless remote root

login is not permitted over SSH

• Expect utility is not able to pass the root

password

1.

Verify that the SSH and SCP commands can be found.

• The SSH commands return the error,

-bash: /usr/bin/ssh -q: No such file or directory

, if SSH is not located where expected.

Set the

RAT_SSHELL

environment variable pointing to the location of SSH:

2.

$ export RAT_SSHELL=path to ssh

• The SCP commands return the error,

/usr/bin/scp -q: No such file or directory

, if SCP is not located where expected.

Set the

RAT_SCOPY

environment variable pointing to the location of SCP:

$ export RAT_SCOPY=path to scp

Verify that the user you are running as, can run the following command manually from where you are running Oracle ORAchk and Oracle EXAchk to whichever remote node is failing.

$ ssh root@remotehostname "id" root@remotehostname's password: uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),

10(wheel)

2-56

Chapter 2


• If you face any problems running the command, then contact the systems administrators to correct temporarily for running the tool.

• Oracle ORAchk and Oracle EXAchk search for the prompts or traps in remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.

• If you can configure passwordless remote root

login, then edit the

/etc/ssh/ sshd_config

file as follows: n to yes

3.

Now, run the following command as root

on all nodes of the cluster: hd restart

Enable Expect debugging.

• Oracle ORAchk uses the Expect utility when available to answer password prompts to connect to remote nodes for password validation. Also, to run root collections without logging the actual connection process by default.

• Set environment variables to help debug remote target connection issues.

–

RAT_EXPECT_DEBUG

: If this variable is set to

-d

, then the Expect command tracing is activated. The trace information is written to the standard output.

For example: export RAT_EXPECT_DEBUG=-d

–

RAT_EXPECT_STRACE_DEBUG

: If this variable is set to strace

, strace

calls the

Expect command. The trace information is written to the standard output.

For example: export RAT_EXPECT_STRACE_DEBUG=strace

• By varying the combinations of these two variables, you can get three levels of

Expect connection trace information.

Note:

Set the

RAT_EXPECT_DEBUG

and

RAT_EXPECT_STRACE_DEBUG

variables only at the direction of Oracle support or development. The

RAT_EXPECT_DEBUG

and

RAT_EXPECT_STRACE_DEBUG variables are used with other variables and user interface options to restrict the amount of data collected during the tracing.

The script

command is used to capture standard output.

As a temporary workaround while you resolve remote problems, run reports local on each node then merge them together later.

On each node, run:

./orachk -local

./exachk -local

Then merge the collections to obtain a single report:

./orachk –merge zipfile 1 zip file 2 > zip file 3 > zip file ...

2-57

Chapter 2


./exachk –merge zipfile 1 zip file 2 > zip file 3 > zip file ...

2.11.4 Permission Problems

You must have sufficient directory permissions to run Oracle ORAchk and Oracle

EXAchk.

1.

Verify that the permissions on the tools scripts orachk

and exachk

are set to

755 (rwxr-xr-x)

.

If the permissions are not set, then set the permissions as follows:

$ chmod 755 orachk

2.

$ chmod 755 exachk

If you install Oracle ORAchk and Oracle EXAchk as root

and run the tools as a different user, then you may not have the necessary directory permissions.

[root@randomdb01 exachk]# ls -la total 14072 drwxr-xr-x 3 root root 4096 Jun 7 08:25 .

drwxrwxrwt 12 root root 4096 Jun 7 09:27 ..

drwxrwxr-x 2 root root 4096 May 24 16:50 .cgrep

-rw-rw-r-- 1 root root 9099005 May 24 16:50 collections.dat

-rwxr-xr-x 1 root root 807865 May 24 16:50 exachk

-rw-r--r-- 1 root root 1646483 Jun 7 08:24 exachk.zip

-rw-r--r-- 1 root root 2591 May 24 16:50 readme.txt

-rw-rw-r-- 1 root root 2799973 May 24 16:50 rules.dat

-rw-r--r-- 1 root root 297 May 24 16:50 UserGuide.txt

In which case, you must run as root

or unzip again as the Oracle software install user.

2.11.5 Slow Performance, Skipped Checks and Timeouts

Follow these steps to fix slow performance and other issues.

When Oracle ORAchk and Oracle EXAchk run commands, a child process is spawned to run the command and a watchdog daemon monitors the child process. If the child process is slow or hung, then the watchdog kills the child process and the check is registered as skipped:

Figure 2-27 Skipped Checks

The watchdog.log

file also contains entries similar to killing stuck command.

Depending on the cause of the problem, you may not see skipped checks.

2-58

Chapter 2


1.

2.

Determine if there is a pattern to what is causing the problem.

• EBS checks, for example, depend on the amount of data present and may take longer than the default timeout.

• Remote checks may timeout and be killed and skipped, if there are prompts in the remote profile. Oracle ORAchk and Oracle EXAchk search for prompts or traps in the remote user profiles. If you have prompts in remote profiles, then comment them out at least temporarily and test run again.

Increase the default timeout.

• Override the default timeout by setting the environment variables.

Table 2-5 Timeout Controlling

3.

4.

Timeout

Controlling

Checks not run by root (most).

Collection of all root checks.

SSH login DNS handshake.

Default Value

(seconds)

90

300

1

Environment Variable

RAT_TIMEOUT

RAT_ROOT_TIMEOUT

RAT_PASSWORDCHECK_TIMEOUT

• The default timeouts are designed to be lengthy enough for most cases. If the timeout is not long enough, then it is possible you are experiencing a system performance problem. Many timeouts can be indicative of a non-Oracle

ORAchk and Oracle EXAchk problem in the environment.

If it is not acceptable to increase the timeout to the point where nothing fails, then try excluding problematic checks running separately with a large enough timeout and then merging the reports back together.

If the problem does not appear to be down to slow or skipped checks but you have a large cluster, then try increasing the number of slave processes user for parallel database run.

• Database collections are run in parallel. The default number of slave processes used for parallel database run is calculated automatically. Change the default number using the options:

-dbparallel slave processes

, or

– dbparallelmax

2-59

Chapter 2


Note:

The higher the parallelism the more resources are consumed. However, the elapsed time is reduced.

Raise or lower the number of parallel slaves beyond the default value.

After the entire system is brought up after maintenance, but before the users are permitted on the system, use a higher number of parallel slaves to finish a run as quickly as possible.

On a busy production system, use a number less than the default value yet more than running in serial mode to get a run more quickly with less impact on the running system.

Turn off the parallel database run using the

-dbserial

option.

2-60

3

Collecting Operating System Resources

Metrics

Use Cluster Health Monitor to collect diagnostic data to analyze performance degradation or failures of critical operating system resources.

This chapter describes how to use Cluster Health Monitor and contains the following sections:

Topics:

•

Understanding Cluster Health Monitor Services

Cluster Health Monitor uses system monitor ( osysmond

) and cluster logger

( ologgerd

) services to collect diagnostic data.

•

Collecting Cluster Health Monitor Data

Collect Cluster Health Monitor data from any node in the cluster by running the

Grid_home/bin/diagcollection.pl

script on the node.

•


Histograms presented in real-time and historical modes enable you to understand precisely what was happening at the time of degradation or failure.

Related Topics:

•



3.1 Understanding Cluster Health Monitor Services

Cluster Health Monitor uses system monitor ( osysmond

) and cluster logger ( ologgerd

) services to collect diagnostic data.

About the System Monitor Service

The system monitor service ( osysmond

) is a real-time monitoring and operating system metric collection service that runs on each cluster node. The system monitor service is managed as a High Availability Services (HAS) resource. The system monitor service forwards the collected metrics to the cluster logger service, ologgerd

. The cluster logger service stores the data in the Oracle Grid Infrastructure Management

Repository database.

About the Cluster Logger Service

The cluster logger service ( ologgerd

) is responsible for preserving the data collected by the system monitor service ( osysmond

) in the Oracle Grid Infrastructure Management

Repository database. In a cluster, there is one cluster logger service ( ologgerd

) per 32 nodes. More logger services are spawned for every additional 32 nodes. The

3-1

Chapter 3

Collecting Cluster Health Monitor Data additional nodes can be a sum of Hub and Leaf Nodes. Oracle Clusterware relocates and starts the service on a different node, if:

• The logger service fails and is not able to come up after a fixed number of retries

• The node where the cluster logger service is running, is down

3.2 Collecting Cluster Health Monitor Data

Collect Cluster Health Monitor data from any node in the cluster by running the

Grid_home/bin/diagcollection.pl

script on the node.

When an Oracle Clusterware error occurs, run the diagcollection.pl

diagnostics collection script to collect diagnostic information from Oracle Clusterware into trace files.

Run the diagcollection.pl

script as root from the

Grid_home/bin

directory.

Note:

• Oracle recommends that you run the diagcollection.pl

script on all nodes in the cluster to collect Cluster Health Monitor data. Running the script on all nodes ensures that you gather all information needed for analysis.

• Run the diagcollection.pl

script as a root

privileged user.

1.

2.

To run the data collection script only on the node where the cluster logger service is running:

3.

Run the command

$ Grid_home/bin/oclumon manage -get master

.

Log in as a user with xx privilege, and change directory to a writable directory outside the Grid home.

Run the command diagcollection.pl --collect

.

For example:

Linux:

$ Grid_home/bin/diagcollection.pl --collect

Microsoft Windows:

C:\Grid_home\perl\bin\perl.exe

C:\Grid_home\bin\diagcollection.pl --collect

4.

Running the command mentioned earlier collects all the available data in the

Oracle Grid Infrastructure Management repository, and creates a file using the format chmosData_host_name_time_stamp.tar.gz

.

For example: chmosData_stact29_20121006_2321.tar.gz

.

Run the command

$ Grid_home/bin/diagcollection.pl --collect --chmos -incidenttime time --incidentduration duration

to limit the amount of data collected.

3-2

Chapter 3


In the command mentioned earlier, the format for the

--incidenttime

argument is

MM/DD/YYYY24HH:MM:SS

and the format for the

--incidentduration

argument is

HH:MM

.

For example:

$ Grid_home/bin/diagcollection.pl --collect --crshome Grid_home

--chmos --incidenttime 07/21/2013 01:00:00 --incidentduration 00:30

Related Topics:

•


Running diagnostics collection script provide additional information so My Oracle

Support can resolve problems.

•

tfactl diagcollect



•


The Trace File Analyzer control utility, TFACTL, is the command-line interface for

Oracle Trace File Analyzer.

•


Use Oracle Trace File Analyzer Collector to request an on-demand collection if you have determined a problem has occurred, and you are not able to resolve.

3.3 Using Cluster Health Monitor from Enterprise Manager

Cloud Control

Histograms presented in real-time and historical modes enable you to understand precisely what was happening at the time of degradation or failure.

The metric data from Cluster Health Monitor is available in graphical display within

Enterprise Manager Cloud Control. Complete cluster views of this data are accessible from the cluster target page. Selecting the Cluster Health Monitoring menu item from the Cluster menu presents a log-in screen prompting for the Cluster Health

Monitor credentials. There is a fixed EMUSER and the password is user-specified.

Once the credentials are saved, you then can view Cluster Health Monitor data for the last day in overview format for the entire cluster. Metric categories are CPU, Memory, and Network.

Each category is able to be separately display in greater detail showing more metrics.

For example, selecting CPU results in cluster graphs detailing CPU System Usage,

CPU User Usage, and CPU Cue Length. From any cluster view, you can select individual node views to more closely examine performance of a single server. As in the case of CPU, the performance of each core is displayed. Move your cursor along the graph to see a tool-tip displaying the numerical values and time stamp of that point.

Besides examining the performance of the current day, you can also review historical data. The amount of historical data is governed by the retention time configured in the

Cluster Health Monitor repository in the Gird Infrastructure Management Repository and defaults to 72 hours. This view is selectable at any time by using the View Mode

drop-down menu and selecting Historical. A previous date can then be entered or selected from a pop-up calendar that has dates where data is available bolded.

Selecting Show Chart then displays the associated metrics graphs.

3-3

Chapter 3


1.

2.

3.

To view Cluster Health Monitor data:

Log in to Enterprise Manager Cloud Control.

Select the Cluster Target you want to view.

From the Cluster drop-down list, select the Cluster Health Monitoring option.

Figure 3-1 EMCC - Cluster Health Monitoring

4.

5.

Enter Cluster Health Monitor login credentials.

From the View Mode drop-down list, select the Real Time option to view the current data.

3-4

Chapter 3


By default, EMCC displays the Overview of resource utilization. You can filter by

CPU, Memory, and Network by selecting an appropriate option from the Select

Chart Type drop-down list.

While viewing CPU and Network metric graphs, click a node name on the legend to view more details.

Figure 3-2 Cluster Health Monitoring - Real Time Data

6.

From the View Mode drop-down list, select the Historical option to view data for the last 24 hours.

a.

To filter historical data by date, select a day on the Select Date calendar control and then click Show Chart.

By default, EMCC displays the Overview of resource utilization. You can filter by

CPU, Memory, and Network by selecting an appropriate option from the Select

Chart Type drop-down list.

While viewing CPU and Network metric graphs, click a node name on the legend to view more details.

3-5

Chapter 3


Figure 3-3 Cluster Health Monitoring - Historical Data

3-6

4

Collecting Diagnostic Data and Triaging,

Diagnosing, and Resolving Issues

Use Oracle Trace File Analyzer to collect comprehensive diagnostic data that saves your time and money.

This chapter describes how to use Oracle Trace File Analyzer Collector and contains the following sections:

Topics:

•


Oracle Trace File Analyzer Collector and Oracle Trace File Analyzer simplify collecting diagnostic data and resolving issues.

•


This section introduces you to installing and configuring Oracle Trace File

Analyzer.

•

Automatically Collecting Diagnostic Data Using the Oracle Trace File Analyzer

Collector

Manage Oracle Trace File Analyzer Collector daemon, diagnostic collections, and the collection repository.

•


Use the tfactl

command to perform further analysis against the database when you have identified a problem and you need more information.

•


This section explains how to manually collect diagnostic data.

•


Use the tfactl analyze

command to analyze and search recent log entries.

•


This section enables you to manage Oracle Database and Oracle Grid

Infrastructure diagnostic data and disk usage snapshots.

•

Upgrading Oracle Trace File Analyzer Collector by Applying a Patch Set Update

Always upgrade to the latest version whenever possible to include bug fixes, new features, and optimizations.

•

Troubleshooting Oracle Trace File Analyzer

Enable specific trace levels when reproducing a problem to obtain sufficient diagnostics.

Related Topics:

•


Oracle Trace File Analyzer Collector is a utility for targeted diagnostic collection that simplifies diagnostic data collection for Oracle Clusterware, Oracle Grid

Infrastructure, and Oracle Real Application Clusters (Oracle RAC) systems, in addition to single instance, non-clustered databases.

4-1

Chapter 4


4.1 Understanding Oracle Trace File Analyzer

Oracle Trace File Analyzer Collector and Oracle Trace File Analyzer simplify collecting diagnostic data and resolving issues.

Oracle Trace File Analyzer Collector does the following:

• Automatically detects significant Oracle database and Oracle Grid Infrastructure problems

• Executes diagnostics and collection of log files

• Trims log files around relevant time periods

• Coordinates collection around the cluster

• Packages all diagnostics in a single package on a single node

• Provides tfactl

command-line interface and shell that simplifies the usage of the database support tools

Oracle Trace File Analyzer uses data collected by Oracle Trace File Analyzer Collector to provide the following:

• Summary report of configured systems, changes, events, and system health

• Analysis of common error log messages

•


•


When running in daemon mode, Oracle Trace File Analyzer monitors important

Oracle logs for events symptomatic of a significant problem.

•



4.1.1 Oracle Trace File Analyzer Architecture

Oracle Trace File Analyzer and Oracle Trace File Analyzer Collector use a single daemon on the database server. If the database is clustered, then a daemon runs on each node of the cluster.

4-2

Chapter 4


Figure 4-1 Oracle Trace File Analyzer Architecture

Control Oracle Trace File Analyzer and Oracle Trace File Analyzer Collector through the command-line interface tfactl

, which can either be used in single command fashion or as a command shell.

The tfactl

command communicates with the local daemon, which then coordinates with all daemons in the cluster. Each daemon runs the necessary diagnostic scripts locally, collects, and then trims local logs.

All daemons coordinate to create the resulting cluster-wide collection on the node where the tfactl

command was run. If the collection was initiated automatically, then the email notification contains the location of the cluster-wide collection.

4.1.2 Oracle Trace File Analyzer Automated Diagnostic Collections

When running in daemon mode, Oracle Trace File Analyzer monitors important Oracle logs for events symptomatic of a significant problem.

Based on the event type detected, Oracle Trace File Analyzer then starts an automatic diagnostic collection.

The data collected depends on the event detected. Oracle Trace File Analyzer coordinates the collection around the cluster, and trims the logs around relevant time periods, and then packs all collection results into a single package on one node.

Oracle Trace File Analyzer does not do a collection for every event detected. When an event is first identified, Oracle Trace File Analyzer triggers the start point for a collection and then waits for five minutes before starting diagnostic gathering. The purpose of waiting for five minutes is to capture any other relevant events together.

• If events are still occurring after 5 minutes, then Oracle Trace File Analyzer waits to complete diagnostic collection for up to a further five minutes for 30 seconds with no events occurring.

• If events are still occurring 10 minutes after first detection, then Oracle Trace File

Analyzer forces a diagnostic collection and generates a new collection start point for the next event.

Once the collection is complete, Oracle Trace File Analyzer sends email notification that includes the details of where the collection results are, to the relevant recipients.

4-3

Figure 4-2 Automatic Diagnostic Collections

Chapter 4


Table 4-1 Trigger Automatic Event Detection

String Pattern

ORA-31(13/37)

ORA-603

ORA-00700

ORA-35(3|5|6)

ORA-40(20|36)

ORA-403(0|1)

ORA-2(27|39|40|55)

ORA-1578

ORA-2(5319|4982)

ORA-56729

OCI-31(06|35)

ORA-445

ORA-00600

ORA-07445

ORA-4(69|([7-8][0-9]|

9([0-3]|[5-8])))

ORA-297(01|02|03|08|

09|10|40|70|71)

ORA-3270(1|3|4)

ORA-494

System State dumped

CRS-16(07|10|11|12)

Logs Monitored

Alert Log - DB

Alert Log – ASM

Alert Log – ASM Proxy

Alert Log – ASM IO Server

Alert Log - CRS

Related Topics:

•


Configure Oracle Trace File Analyzer to send an email when an automatic collection completes to the email address that is registered with Oracle Trace File

Analyzer.

•


4-4

Chapter 4


4.1.3 Oracle Trace File Analyzer Collector On-Demand Diagnostic

Collections


Provide the resulting collection to Oracle Customer Support to help you diagnose and resolve the problem.

•

Types of On-Demand Collections

Oracle Trace File Analyzer Collector performs three types of on-demand collections. Use the tfactl diagcollect

command to perform all on-demand collections.

4.1.3.1 Types of On-Demand Collections

Oracle Trace File Analyzer Collector performs three types of on-demand collections.


command to perform all on-demand collections.

Figure 4-3 On-Demand Collections

•

Default Collections

Default collections gather all important files and diagnostics from all nodes for all components where the file has been updated within a particular time frame. Oracle

Trace File Analyzer trims, if it deems the files to be excessive.

The standard time period used for the default collections is the past 12 hours.

However, you can adjust to any other time period.

•

Event Driven SRDC Collections

Event driven Service Request Data Collections (SRDC) gather all important files and diagnostics related to a particular event, such as an error.

The files and diagnostics collected depend on the event the SRDC collection is for.

Oracle Trace File Analyzer prompts you for any other important information that it needs. Providing information enables Oracle Trace File Analyzer to understand how best to collect diagnostics about each event.

4-5

Chapter 4


•

Custom Collections

Custom collections allow you to provide granular control over exactly what, how, and from where collected.

Related Topics:

•



command to request a collection.

•





•

tfactl diagcollect



•


Use the custom collection options to collect diagnostic data from specific nodes, components, and directories.

4.2 Getting Started with Oracle Trace File Analyzer

This section introduces you to installing and configuring Oracle Trace File Analyzer.

•


Review the supported platforms and product versions for Oracle Trace File

Analyzer and Oracle Trace File Analyzer Collector.

•




or rootupgrade.sh

.

•

Oracle Database Trace File Analyzer Installation

Oracle Trace File Analyzer is installed as part of the database installation.

•

Securing Access to Oracle Trace File Analyzer

Running tfactl

commands is restricted only to authorized users.

•

Masking Sensitive Data

Configure Oracle Trace File Analyzer to mask sensitive data in log files.

•


Configure Oracle Trace File Analyzer to send an email when an automatic collection completes to the email address that is registered with Oracle Trace File

Analyzer.

4.2.1 Supported Platforms and Product Versions

Review the supported platforms and product versions for Oracle Trace File Analyzer and Oracle Trace File Analyzer Collector.

Oracle Trace File Analyzer and Oracle Trace File Analyzer Collector are supported on the following operating systems:

• Linux OEL

• Linux RedHat

• Linux SuSE

• Linux Itanium

4-6

Chapter 4


• zLinux

• Oracle Solaris SPARC

• Oracle Solaris x86-64

• AIX

• HPUX Itanium

• HPUX PA-RISC

• Microsoft Windows

Oracle Trace File Analyzer and Oracle Trace File Analyzer Collector are supported with Oracle Grid Infrastructure and/or Oracle Database versions 10.2 or later.

4.2.2 Oracle Grid Infrastructure Trace File Analyzer Installation



or rootupgrade.sh

.

Two tfa

directories are created when Oracle Trace File Analyzer is installed as part of the Oracle Grid Infrastructure.

•

Grid_home/tfa

: This directory contains the Oracle Trace File Analyzer executables and some configuration files.

•

ORACLE_BASE/tfa

: Where

ORACLE_BASE

is the Oracle Grid Infrastructure owner’s

ORACLE_BASE

. This directory contains the Oracle Trace File Analyzer metadata files and logs.

Note:

The

ORACLE_BASE

can be on a shared file system because Oracle Trace File

Analyzer creates a node-specific directory under the tfa

directory.

Oracle Trace File Analyzer uses the JRE version 1.8, which is shipped as part of the

Oracle Grid Infrastructure 12.2 or Oracle Database 12.2 Home.

By default, Oracle Trace File Analyzer is configured to start automatically. The automatic start implementation is platform-dependent.

For example:

Linux systems: Automatic restart is accomplished by using, init

or

An init

replacement such as upstart

or

systemd

Microsoft Window:

Automatic restart is implemented as a Windows service.

4-7

Chapter 4


Oracle Trace File Analyzer is not managed as one of the Cluster Ready Services

(CRS) because it must be available if CRS is down.

• Start Oracle Trace File Analyzer as follows:

Grid_home/tfa/bin/tfactl start

For example:

$ /u01/app/12.2.0/grid/tfa/bin/tfactl start

Starting TFA..

start: Job is already running: oracle-tfa

Waiting up to 100 seconds for TFA to be started..

. . . . .

Successfully started TFA Process..

. . . . .

TFA Started and listening for commands

• Stop Oracle Trace File Analyzer as follows:

Grid_home/tfa/bin/tfactl stop

For example:

$ /u01/app/12.2.0/grid/tfa/bin/tfactl stop

Stopping TFA from the Command Line

Stopped OSWatcher

TFA is running - Will wait 5 seconds (up to 3 times)

TFA-00518 Oracle Trace File Analyzer (TFA) is not running (stopped)

TFA Stopped Successfully

. . .

Successfully stopped TFA..

Note:

In the preceding example output, "

Stopped OSWatcher

" is seen only if you are using the download from My Oracle Support. Since

OSWatcher

is included only in the download and not in the Oracle Grid Infrastructure or Oracle

Database install.

4.2.3 Oracle Database Trace File Analyzer Installation

Oracle Trace File Analyzer is installed as part of the database installation.

Oracle recommends that you run Oracle Trace File Analyzer in daemon mode, which is configured as root

user.

1.

To configure daemon mode as root

:

Either choose an appropriate option when running root.sh

or rootupgrade.sh

.

Or

Configure post-install by running the tfa_home/install/roottfa.sh

script.

When you choose this option, Oracle Trace File Analyzer is installed in the

ORACLE_BASE

of the current installation owner.

4-8

Chapter 4


2.

To use Oracle Trace File Analyzer in non-daemon mode, access it from

ORACLE_HOME/suptools/tfa/release/tfa_home

using:

$ ORACLE_HOME/suptools/tfa/release/tfa_home/bin/tfactl command

When a user uses tfactl

for the first time, Oracle Trace File Analyzer determines and creates the

TFA_BASE

directory structure. Oracle Trace File Analyzer maintains a configuration and trace file metadata database for every user who runs tfactl

.

In non-daemon mode, the ability of a user to run the tfactl

command determines

Oracle Trace File Analyzer access control list. A user is able to use tfactl

, if the user has operating system permissions to run tfactl

. However, tfactl

collects only the data the user has operating system permission to read.

When Oracle Trace File Analyzer is installed in daemon mode, the Oracle Trace File

Analyzer daemon runs as root

. The ability of the user to access Oracle Trace File

Analyzer depends on that user being given specific access rights using the tfactl access

command.

If a user has access right to Oracle Trace File Analyzer, then Oracle Trace File

Analyzer collects any files from diagnostic directories in Oracle Trace File Analyzer that are marked as public. Specify the directory as private while adding to Oracle

Trace File Analyzer, to restrict specific Oracle Trace File Analyzer users with sufficient permissions accessing it. Also, modify the settings using the tfactl directory modify command.

Related Topics:

•

tfactl access

Use the tfactl access

command to allow non-root users to have controlled access to Oracle Trace File Analyzer, and to run diagnostic collections.

•

tfactl print

Use the tfactl print

command to print information from the Berkeley database.

•

tfactl directory

Use the tfactl directory

command to add a directory to, or remove a directory from the list of directories to analyze their trace or log files.

4.2.4 Securing Access to Oracle Trace File Analyzer

Running tfactl

commands is restricted only to authorized users.

tfactl

provides a command-line interface and shell in order to:

• Run any desired diagnostics and collect all relevant log data from a time of your choosing

• Trim log files around the time, collecting only what is necessary for diagnosis

• Collect and package all trimmed diagnostics, from any desired nodes in the cluster and consolidate everything in one package on a single node

Authorized non-root users can run a subset of the tfactl

commands. All other tfactl

commands require root

access. Users who are not authorized cannot run any tfactl command.

By default, the following users are authorized to access a subset of tfactl

commands:

• Oracle Grid Infrastructure home owner

4-9

Chapter 4


• Oracle Database home owners

To provision user access to tfactl:

1.

To list the users who have access to tfactl

:

2.

tfactl access lsusers

To add a user to access tfactl

: tfactl access add –user user [-local]

3.

4.

By default, access commands apply to cluster-wide unless

–local

is used to restrict to local node.

To remove a user from accessing tfactl

: tfactl access remove –user user [-local]

To remove all users from accessing tfactl

:

5.

tfactl access removeall [-local]

To reset user access to default: tfactl access reset

4.2.5 Masking Sensitive Data

Configure Oracle Trace File Analyzer to mask sensitive data in log files.

Masking sensitive data is an optional feature that you can configure Oracle Trace File

Analyzer to mask sensitive data in log files. Oracle Trace File Analyzer masks information such as host names or IP addresses and replaces sensitive data consistently throughout all files. Replacing consistently means that the information is still relevant and useful for the purposes of diagnosis without sharing any sensitive data.

1.

2.

To configure masking:

Create a file called mask_strings.xml

in the directory tfa_home/resources

.

Define a

mask_strings

element then within that a

mask_string

element, with

original

and

replacement

for each string you wish to replace:

For example:

<mask_strings>

<mask_string>

<original>WidgetNode1</original>

<replacement>Node1</replacement>

</mask_string>

<mask_string>

<original>192.168.5.1</original>

<replacement>Node1-IP</replacement>

</mask_string>

<mask_string>

<original>WidgetNode2</original>

<replacement>Node2</replacement>

</mask_string>

<mask_string>

<original>192.168.5.2</original>

<replacement>Node2-IP</replacement>

4-10

Chapter 4

Automatically Collecting Diagnostic Data Using the Oracle Trace File Analyzer Collector

</mask_string>

</mask_strings>

Oracle Trace File Analyzer automatically locates the mask_strings.xml

files and starts replacing the sensitive data in the diagnostics it collects.

4.2.6 Configuring Email Notification Details

Configure Oracle Trace File Analyzer to send an email when an automatic collection completes to the email address that is registered with Oracle Trace File Analyzer.

Configure the system on which Oracle Trace File Analyzer is running to send emails.

Otherwise, email notification feature does not work.

To configure email notification details:

1.

To set notification email to use for a specific

ORACLE_HOME

, include the operating system owner in the command: tfactl set notificationAddress=os_user:email

2.

For example: tfactl set notificationAddress=oracle:[email protected]

To set notification email to use for any

ORACLE_HOME

: tfactl set notificationAddress=email

For example:

3.

tfactl set [email protected]

Do the following after receiving the notification email:

a.

b.

Inspect the referenced collection details to determine if you know the root cause.

Resolve the underlying cause of the problem if you know how

c.

If you do not know the root cause of the problem, then log an SR with Oracle

Support and upload the provided collection details

Related Topics:

•


When running in daemon mode, Oracle Trace File Analyzer monitors important

Oracle logs for events symptomatic of a significant problem.

4.3 Automatically Collecting Diagnostic Data Using the

Oracle Trace File Analyzer Collector

Manage Oracle Trace File Analyzer Collector daemon, diagnostic collections, and the collection repository.

In addition, add hosts to the Oracle Trace File Analyzer Collector configuration, modify default communication ports, and configure SSL protocol.

4-11

Chapter 4


•

Managing the Oracle Trace File Analyzer Daemon

Oracle Trace File Analyzer Collector runs out of init

on UNIX systems or init/ upstart/systemd

on Linux systems so that Oracle Trace File Analyzer

Collector starts automatically whenever a node starts.

•

Viewing the Status and Configuration of Oracle Trace File Analyzer

View the status of Oracle Trace File Analyzer across all the nodes in the cluster using either tfactl print status

or tfactl print config

commands.

•

Configuring the Host

You must have root

or sudo

access to tfactl

to add hosts to Oracle Trace File

Analyzer configuration.

•

Configuring the Ports

The Oracle Trace File Analyzer daemons in a cluster communicate securely over ports 5000 to 5005.

•

Configuring SSL and SSL Certificates

View and restrict SSL/TLS protocols. Configure Oracle Trace File Analyzer to use self-signed or CA-signed certificate.

•

Managing Collections

Manage directories configured in Oracle Trace File Analyzer and diagnostic collections.

•

Managing the Repository

Oracle Trace File Analyzer stores all diagnostic collections in the repository.

4.3.1 Managing the Oracle Trace File Analyzer Daemon

Oracle Trace File Analyzer Collector runs out of init

on UNIX systems or init/ upstart/systemd

on Linux systems so that Oracle Trace File Analyzer Collector starts automatically whenever a node starts.

To manage Oracle Trace File Analyzer daemon:

The init

control file

/etc/init.d/init.tfa

is platform dependant.

1.

To manually start or stop Oracle Trace File Analyzer:

• tfactl start

: Starts the Oracle Trace File Analyzer daemon

2.

• tfactl stop

: Stops the Oracle Trace File Analyzer daemon

If the Oracle Trace File Analyzer daemon fails, then the operating system restarts the daemon automatically.

To enable or disable automatic restarting of the Oracle Trace File Analyzer daemon:

• tfactl disable

: Disables automatic restarting of the Oracle Trace File

Analyzer daemon.

• tfactl enable

: Enables automatic restarting of the Oracle Trace File Analyzer daemon.

4-12

Chapter 4


4.3.2 Viewing the Status and Configuration of Oracle Trace File

Analyzer

View the status of Oracle Trace File Analyzer across all the nodes in the cluster using either tfactl print status

or tfactl print config

commands.

To view the status and configuration settings of Oracle Trace File Analyzer:

1.

To view the status of Oracle Trace File Analyzer all nodes in the cluster: tfactl print status

For example:

$ tfactl print status

.--------------------------------------------------------------------------------

-------------.

| Host | Status of TFA | PID | Port | Version | Build ID |

Inventory Status |

+-------+---------------+-------+------+------------+----------------------

+------------------+

| node1 | RUNNING | 29591 | 5000 | 12.2.1.0.0 | 12210020160810105317 |

COMPLETE |

| node2 | RUNNING | 34738 | 5000 | 12.2.1.0.0 | 12210020160810105317 |

COMPLETE |

'-------+---------------+-------+------+------------+----------------------

+------------------'

2.

Displays the status of Oracle Trace File Analyzer across all nodes in the cluster, and also displays the Oracle Trace File Analyzer version and the port on which it is running.

To view configuration settings of Oracle Trace File Analyzer: tfactl print config

For example:

$ tfactl print config

.--------------------------------------------------------------------------------

----.

| node1 |

+-----------------------------------------------------------------------

+------------+

| Configuration Parameter |

Value |

+-----------------------------------------------------------------------

+------------+

| TFA Version |

12.2.1.0.0 |

| Java Version |

1.8 |

| Public IP Network | true |

| Automatic Diagnostic Collection | true |

| Alert Log Scan | true |

4-13

Chapter 4


| Disk Usage Monitor | true |

| Managelogs Auto Purge | false |

| Trimming of files during diagcollection | true |

| Inventory Trace level |

1 |

| Collection Trace level |

1 |

| Scan Trace level |

1 |

| Other Trace level |

1 |

| Repository current size (MB) |

447 |

| Repository maximum size (MB) |

10240 |

| Max Size of TFA Log (MB) |

50 |

| Max Number of TFA Logs |

10 |

| Max Size of Core File (MB) |

20 |

| Max Collection Size of Core Files (MB) |

200 |

| Minimum Free Space to enable Alert Log Scan (MB) |

500 |

| Time interval between consecutive Disk Usage Snapshot(minutes) |

60 |

| Time interval between consecutive Managelogs Auto Purge(minutes) |

60 |

| Logs older than the time period will be auto purged(days[d]|hours[h]) |

30d |

| Automatic Purging | true |

| Age of Purging Collections (Hours) |

12 |

| TFA IPS Pool Size |

5 |

'-----------------------------------------------------------------------

+------------'

Related Topics:

•

tfactl print



4.3.3 Configuring the Host

You must have root

or sudo

access to tfactl

to add hosts to Oracle Trace File


To add, remove, and replace SSL certificates:

1.

To view the list of current hosts in the Oracle Trace File Analyzer configuration:

2.

tfactl print hosts

To add a host to the Oracle Trace File Analyzer configuration for the first time:

4-14

Chapter 4


a.

b.

If necessary, install and start Oracle Trace File Analyzer on the new host.

From the existing host, synchronize authentication certificates for all hosts by running: tfactl syncnodes

3.

4.

c.

If needed, then Oracle Trace File Analyzer displays the current node list it is aware of and prompts you to update this node list.

Select Y, and then enter the name of the new host.

Oracle Trace File Analyzer contacts Oracle Trace File Analyzer on the new host to synchronize certificates and add each other to their respective hosts lists.

To remove a host: tfactl host remove host

To add a host and the certificates that are already synchronized: tfactl host add host

Oracle Trace File Analyzer generates self-signed SSL certificates during install.

Replace those certificates with one of the following:

• Personal self-signed certificate

• CA-signed certificate

4.3.4 Configuring the Ports

The Oracle Trace File Analyzer daemons in a cluster communicate securely over ports

5000 to 5005.

If the port range is not available on your system, then replace it with the ports available on your system.

The

$TFA_HOME/internal/usableports.txt

file looks as follows:

$ cat $TFA_HOME/internal/usableports.txt

5000

5001

5002

5003

5004

5005

4.

5.

2.

3.

To change the ports:

1.

Stop Oracle Trace File Analyzer on all nodes: tfactl stop

Edit the usableports.txt

file to replace the ports.

Replicate the usableports.txt

changes to all cluster nodes.

Remove the

$TFA_HOME/internal/port.txt

file on all nodes.

Start Oracle Trace File Analyzer on all nodes: tfactl start

4-15

Chapter 4


4.3.5 Configuring SSL and SSL Certificates

View and restrict SSL/TLS protocols. Configure Oracle Trace File Analyzer to use selfsigned or CA-signed certificate.

•

Configuring SSL/TLS Protocols

The Oracle Trace File Analyzer daemons in a cluster communicate securely using the SSL/TLS protocols.

•

Configuring Self-Signed Certificates

Use

Java keytool

to replace self-signed SSL certificates with personal self-signed certificates.

•

Configuring CA-Signed Certificates

Use

Java keytool

and openssl

to replace self-signed SSL certificates with the

Certificate Authority (CA) signed certificates.

4.3.5.1 Configuring SSL/TLS Protocols

The Oracle Trace File Analyzer daemons in a cluster communicate securely using the

SSL/TLS protocols.

The SSL protocols available for use by Oracle Trace File Analyzer are:

•

TLSv1.2

•

TLCv1.1

•

TLSv1

Oracle Trace File Analyzer always restricts use of older the protocols

SSLv3

and

SSLv2Hello

.

To view and restrict protocols:

1.

To view the available and restricted protocols: tfactl print protocols

2.

For example:

$ tfactl print protocols

.---------------------------------------.

| node1 |

+---------------------------------------+

| Protocols |

+---------------------------------------+

| Available : [TLSv1, TLSv1.2, TLSv1.1] |

| Restricted : [SSLv3, SSLv2Hello] |

'---------------------------------------'

To restrict the use of certain protocols: tfactl restrictprotocol [-force] protocol

For example:

$ tfactl restrictprotocol TLSv1

4-16

Chapter 4


4.3.5.2 Configuring Self-Signed Certificates

Use

Java keytool

to replace self-signed SSL certificates with personal self-signed certificates.

To configure Oracle Trace File Analyzer to use self-signed certificates:

1.

Create a private key and keystore file containing the self-signed certificate for the server:

2.

3.

4.

$ keytool -genkey -alias server_full -keyalg RSA -keysize 2048 -validity 18263 keystore myserver.jks

Create a private key and keystore file containing the private key and self signedcertificate for the client:

$ keytool -genkey -alias client_full -keyalg RSA -keysize 2048 -validity 18263 keystore myclient.jks

Export the server public key certificate from the server keystore:

$ keytool -export -alias server_full -file myserver_pub.crt -keystore myserver.jks -storepass password

Export the client public key certificate from the server keystore:

5.

6.

7.

8.

9.

$ keytool -export -alias client_full -file myclient_pub.crt -keystore myclient.jks -storepass password

Import the server public key certificate into the client keystore:

$ keytool -import -alias server_pub -file myserver_pub.crt -keystore myclient.jks -storepass password

Import the client public key certificate into the server keystore:

$ keytool -import -alias client_pub -file myclient_pub.crt -keystore myserver.jks -storepass password

Restrict the permissions on the keystores to root read-only

.

$ chmod 400 myclient.jks myserver.jks

Copy the keystores ( jks

files) to each node.

Configure Oracle Trace File Analyzer to use the new certificates:

$ tfactl set sslconfig

10.

Restart the Oracle Trace File Analyzer process to start using new certificates:

$ tfactl stop

$ tfactl start

4.3.5.3 Configuring CA-Signed Certificates

Use

Java keytool

and openssl

to replace self-signed SSL certificates with the

Certificate Authority (CA) signed certificates.

To configure Oracle Trace File Analyzer to use CA-signed certificates:

1.

Create a private key for the server request:

$ openssl genrsa -aes256 -out myserver.key 2048

4-17

Chapter 4


2.

3.

4.

5.

6.

Create a private key for the client request:

$ openssl genrsa -aes256 -out myclient.key 2048

Create a Certificate Signing Request (CSR) for the server:

$ openssl req -key myserver.key -new -sha256 -out myserver.csr

Create a Certificate Signing Request (CSR) for the client:

$ openssl req -key myclient.key -new -sha256 -out myclient.csr

Send the resulting CSR for the client and the server to the relevant signing authority.

The signing authority sends back the signed certificates:

• myserver.cert

• myclient.cert

• CA root certificate

Convert the certificates to JKS format for the server and the client:

$ openssl pkcs12 -export -out serverCert.pkcs12 -in myserver.cert -inkey myserver.key

$ keytool -v -importkeystore -srckeystore serverCert.pkcs12 -srcstoretype PKCS12

-destkeystore myserver.jks -deststoretype JKS

$ openssl pkcs12 -export -out clientCert.pkcs12 -in myclient.cert -inkey myclient.key

7.

8.

$ keytool -v -importkeystore -srckeystore clientCert.pkcs12 -srcstoretype PKCS12

-destkeystore myclient.jks -deststoretype JKS

Import the server public key into to the client jks

file:

$ keytool -import -v -alias server-ca -file myserver.cert -keystore myclient.jks

Import the client public key to the server jks

file:

9.

$ keytool -import -v -alias client-ca -file myclient.cert -keystore myserver.jks

Import the CA root certificate from the signing authority into the Oracle Trace File

Analyzer server certificate:

$ keytool -importcert -trustcacerts -alias inter -file caroot.cert -keystore myserver.jks

10.

Restrict the permissions on the keystores to root read-only

:

$ chmod 400 myclient.jks myserver.jks

11.

Copy the keystores ( jks

files) to each node.

12.

Configure Oracle Trace File Analyzer to use the new certificates:

$ tfactl set sslconfig

13.

Restart the Oracle Trace File Analyzer process to start using the new certificates.

$ tfactl stop

$ tfactl start

4-18

Chapter 4


4.3.6 Managing Collections

Manage directories configured in Oracle Trace File Analyzer and diagnostic collections.

•

Including Directories

Add directories to the Oracle Trace File Analyzer configuration to include the directories in diagnostic collections.

•

Managing the Size of Collections

Use the Oracle Trace File Analyzer configuration options trimfiles

, maxcorefilesize

, maxcorecollectionsize

, and diagcollect -nocores

to reduce the size of collections.

4.3.6.1 Including Directories

Add directories to the Oracle Trace File Analyzer configuration to include the directories in diagnostic collections.

Oracle Trace File Analyzer then stores diagnostic collection metadata about the:

• Directory

• Subdirectories

• Files in the directory and all sub directories

All Oracle Trace File Analyzer users can add directories they have read access to.

To manage directories:

1.

To view the current directories configured in Oracle Trace File Analyzer

2.

tfactl print directories [ -node all | local | n1,n2,... ]

[ -comp component_name1,component_name2,.. ]

[ -policy exclusions | noexclusions ]

[ -permission public | private ]

To add directories:

3.

tfactl directory add dir

[ -public ]

[ -exclusions | -noexclusions | -collectall ]

[ -node all | n1,n2,... ]

To remove a directory from being collected: tfactl directory remove dir [ -node all | n1,n2,... ]

4.3.6.2 Managing the Size of Collections

Use the Oracle Trace File Analyzer configuration options trimfiles

, maxcorefilesize

, maxcorecollectionsize

, and diagcollect -nocores

to reduce the size of collections.

To manage the size of collections:

1.

To trim files during diagnostic collection: tfactl set trimfiles=ON|OFF

4-19

Chapter 4


2.

• When set to ON (default), Oracle Trace File Analyzer trims files to include data around the time of the event

• When set to OFF, any file that was written to at the time of the event is collected in its entirety

To set the maximum size of core file to n MB (default 20 MB): tfactl set maxcorefilesize=n

3.

Oracle Trace File Analyzer skips core files that are greater than maxcorefilesize

.

To set the maximum collection size of core files to n MB (default 200 MB): tfactl set maxcorecollectionsize=n

4.

Oracle Trace File Analyzer skips collecting core files after maxcorecollectionsize

is reached.

To prevent the collection of core files with diagnostic collections: tfactl diagcollect -nocores

4.3.7 Managing the Repository

Oracle Trace File Analyzer stores all diagnostic collections in the repository.

The repository size is the maximum space Oracle Trace File Analyzer is able to use on disk to store collections.

•


•

Purging the Repository Manually

4.3.7.1 Purging the Repository Automatically

Oracle Trace File Analyzer closes the repository, if:

• Free space in

TFA_HOME

is less than 100 MB, also stops indexing

• Free space in

ORACLE_BASE

is less than 100 MB, also stops indexing

• Free space in the repository is less than 1 GB

• Current size of the repository is greater than the repository max size ( reposizeMB

)

The Oracle Trace File Analyzer daemon monitors and automatically purges the repository when the free space falls below 1 GB or before closing the repository.

Purging removes collections from largest size through to smallest until the repository has enough space to open.

Oracle Trace File Analyzer automatically purges only the collections that are older than minagetopurge

. By default, minagetopurge

is 12 hours.

To purge the repository automatically

1.

To change the minimum age to purge: set minagetopurge=number of hours

For example:

$ tfactl set minagetopurge=48

4-20

Chapter 4


2.

Purging the repository automatically is enabled by default.

To disable or enable automatic purging: set autopurge=ON|OFF

3.

For example:

$ tfactl set autopurge=ON

To change the location of the repository: set repositorydir=dir

4.

For example:

$ tfactl set repositorydir=/opt/mypath

To change the size of the repository: set reposizeMB

For example:

$ tfactl set reposizeMB=20480

4.3.7.2 Purging the Repository Manually

To purge the repository manually:

1.

To view the status of the Oracle Trace File Analyzer repository:

2.

3.

tfactl print repository

To view statistics about collections: tfactl print collections

To manually purge collections that are older than a specific time: tfactl purge -older number[h|d] [-force]

4.4 Analyzing the Problems Identified

Use the tfactl


4-21

Figure 4-4 Analysis

Chapter 4


Related Topics:

•




•

tfactl analyze


command to obtain analysis of your system by parsing the database, Oracle ASM, and Oracle Grid Infrastructure alert logs, system message logs, OSWatcher Top, and OSWatcher Slabinfo files.

4.5 Manually Collecting Diagnostic Data

This section explains how to manually collect diagnostic data.

•




•





•



4.5.1 Running On-Demand Default Collections



Oracle Trace File Analyzer stores all collections in the repository directory of the

Oracle Trace File Analyzer installation.

The standard time period used for the default collections is the past 12 hours.

However, you can adjust to any other time period.

To run on-demand default collections:

1.

To request default collection: tfactl diagcollect

4-22

Chapter 4


For example:

$ tfactl diagcollect

Collecting data for the last 12 hours for all components...

Collecting data for all nodes

Collection Id : 20160616115923myserver69

Detailed Logging at :

/u01/app/tfa/repository/collection_Thu_Jun_16_11_59_23_PDT_2016_node_all/ diagcollect_20160616115923_myserver69.log

2016/06/16 11:59:27 PDT : Collection Name : tfa_Thu_Jun_16_11_59_23_PDT_2016.zip

2016/06/16 11:59:28 PDT : Collecting diagnostics from hosts :

[myserver70, myserver71, myserver69]

2016/06/16 11:59:28 PDT : Scanning of files for Collection in progress...

2016/06/16 11:59:28 PDT : Collecting additional diagnostic information...

2016/06/16 11:59:33 PDT : Getting list of files satisfying time range

[06/15/2016 23:59:27 PDT, 06/16/2016 11:59:33 PDT]

2016/06/16 11:59:37 PDT : Collecting ADR incident files...

2016/06/16 12:00:32 PDT : Completed collection of additional diagnostic information...

2016/06/16 12:00:39 PDT : Completed Local Collection

2016/06/16 12:00:40 PDT : Remote Collection in Progress...

.--------------------------------------.

| Collection Summary |

+------------+-----------+------+------+

| Host | Status | Size | Time |

+------------+-----------+------+------+

| myserver71 | Completed | 15MB | 64s |



'------------+-----------+------+------'

Logs are being collected to:

/u01/app/tfa/repository/collection_Thu_Jun_16_11_59_23_PDT_2016_node_all

/u01/app/tfa/repository/collection_Thu_Jun_16_11_59_23_PDT_2016_node_all/ myserver71.tfa_Thu_Jun_16_11_59_23_PDT_2016.zip



•


By default, diagcollect

trims and collects all important log files, from all nodes, for all components, where the file has been updated in the past 12 hours.

Related Topics:

•

tfactl diagcollect



•



4-23

Chapter 4


4.5.1.1 Adjusting the Time Period for a Collection

By default, diagcollect

trims and collects all important log files, from all nodes, for all components, where the file has been updated in the past 12 hours.

Narrow down the problem further and collect the minimal possible data.

There are four different ways of specifying a time period for the collection.

Use whichever is most appropriate in your situation based on what you know about when the symptoms of the problem occurred and any anything relevant that might have contributed to it.

Table 4-2 Adjusting the Time Period for a Collection

Command

-since nh|d

-from “yyyy-mm-dd”

-to “yyyy-mm-dd”

-for “yyyy-mm-dd”

Description

Collect since the previous n hours or days.

Collect from the date and optionally time specified.

Valid date / time formats:

•

"Mon/dd/yyyy hh:mm:ss"

•

"yyyy-mm-dd hh:mm:ss"

•

"yyyy-mm-ddThh:mm:ss"

•

"yyyy-mm-dd"

Collect to the date and optionally time specified.


•

"Mon/dd/yyyy hh:mm:ss"

•

"yyyy-mm-dd hh:mm:ss"

•

"yyyy-mm-ddThh:mm:ss"

•

"yyyy-mm-dd"

Collect for the specified date.


•

"Mon/dd/yyyy"

•

"yyyy-mm-dd"

To adjust the time period for a collection:

1.

To adjust the time period: tfactl diagcollect –since nh|d

For example:

To do a collection covering the past 2 hours:

$ tfactl diagcollect –since 2h

To do a collection covering the past 3 days:

$ tfactl diagcollect –since 3d

To do a collection for a specific date:

$ tfactl diagcollect -for "2016-08-15"

To do a collection from one particular date to another:

4-24

Chapter 4


$ tfactl diagcollect -from "2016-08-15" -to "2016-08-17"

Related Topics:

•

tfactl diagcollect



4.5.2 Running On-Demand Event-Driven SRDC Collections


option to collect diagnostics needed for an Oracle Support

Service Request Data Collection (SRDC).

Event-driven SDRC collections require components from the Oracle Trace File

Analyzer Database Support Tools Bundle.

Download Oracle Trace File Analyzer Database Support Tools Bundle from My Oracle

Support Note 1513912.2.

To run event-driven SRDC collections:

1.

To run event-driven SRDC collections:

2.

tfactl diagcollect –srdc srdc_type

To obtain a list of different types of SRDC collections: tfactl diagcollect –srdc -help

For example:

$ tfactl diagcollect –srdc ora600

Enter value for EVENT_TIME [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :

Enter value for DATABASE_NAME [<RETURN>=ALL] :

1. Jun/09/2016 09:56:47 : [rdb11204] ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], [] 2. May/19/2016

14:19:30 : [rdb11204] ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], [] 3. May/13/2016

10:14:30 : [rdb11204] ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], [] 4. May/13/2016

10:14:09 : [rdb11204] ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []

Please choose the event : 1-4 [1] 1

Selected value is : 1 ( Jun/09/2016 09:56:47 ) Collecting data for local node(s)

Scanning files from Jun/09/2016 03:56:47 to Jun/09/2016 15:56:47

Collection Id : 20160616115820myserver69

Detailed Logging at :

/u01/app/tfa/repository/ srdc_ora600_collection_Thu_Jun_16_11_58_20_PDT_2016_node_local/ diagcollect_20160616115820_myserver69.log

2016/06/16 11:58:23 PDT : Collection Name : tfa_srdc_ora600_Thu_Jun_16_11_58_20_PDT_2016.zip




[06/09/2016 03:56:47 PDT, 06/09/2016 15:56:47 PDT]



4-25

Chapter 4


3.


.---------------------------------------.


+------------+-----------+-------+------+


+------------+-----------+-------+------+

| myserver69 | Completed | 7.9MB | 43s |

'------------+-----------+-------+------'

Use the same tagging, naming, and time arguments with the SRDC collections as with other collections:

Usage : tfactl diagcollect -srdc srdc_profile [-tag description] [-z filename] [since nh|d| -from time -to time | -for time]

Related Topics:

•

tfactl diagcollect



•




4.5.3 Running On-Demand Custom Collections


By default, Oracle Trace File Analyzer:

• Collects from all nodes in the cluster

• Collects from all Oracle database and Oracle Grid Infrastructure components

• Compresses the collections into the repository directory in a zip

file with the following format:

repository/collection_date_time/node_all/node.tfa_date_time.zip

• Copies back all zip files from remote notes to the initiating node

• Trims the files around the relevant time

• Includes any relevant core files it finds

Also, Oracle Trace File Analyzer Collector collects files from any other directories you want.

•

Collecting from Specific Nodes

•

Collecting from Specific Components

•

Collecting from Specific Directories

•

Changing the Collection Name

•

Preventing Copying Zip Files and Trimming Files

•

Performing Silent Collection

•

Preventing Collecting Core Files

•

Collecting Incident Packaging Service Packages

4-26

Chapter 4


Related Topics:

•

tfactl diagcollect



•

tfactl ips

Use the tfactl ips

command to collect Automatic Diagnostic Repository diagnostic data.

•



4.5.3.1 Collecting from Specific Nodes

To collect from specific nodes:

1.

To collect from specific nodes: tfactl diagcollect –node list of nodes

For example:

$ tfactl diagcollect -last 1d -node myserver65

4.5.3.2 Collecting from Specific Components

To collect from specific components:

1.

To collect from specific components: tfactl diagcollect component

For example:

To trim and collect all files from the databases

hrdb

and

fdb

in the last 1 day:

$ tfactl –diagcollect –database hrdb,fdb –last 1d

To trim and collect all CRS files, operating system logs, and CHMOS/OSW data from

node1

and

node2

updated in the last 6 hours:

$ tfactl diagcollect -crs -os -node node1,node2 -last 6h

To trim and collect all Oracle ASM logs from

node1

updated between from and to time:

$ tfactl diagcollect -asm -node node1 -from "2016-08-15" -to "2016-08-17"

Following are the available component options.

Table 4-3 Component Options

Component Option

-database

database_names

-asm

Description

Collects database logs from databases specified in a commaseparated list.

Collects Oracle ASM logs.

4-27

Chapter 4


Table 4-3 (Cont.) Component Options

Component Option

-crsclient

-dbclient

-dbwlm

-tns

-rhp

-procinfo

-afd

-crs

-wls

-emagent

-oms

-ocm

-emplugins

-em

-acfs

-install

-cfgtools

-os

-ashhtml

-ashtext

-awrhtml

Description

Collects Client Logs that are under

GIBASE/diag/ clients

.

Collects Client Logs that are under

DB ORABASE/diag/ clients

.

Collects DBWLM logs.

Collects TNS logs.

Collects RHP logs.

Collects

Gathers stack

and fd

from

/proc

for all processes.

Collects AFD logs.

Collects CRS logs.

Collects WLS logs.

Collects EMAGENT logs.

Collects OMS logs.

Collects OCM logs.

Collects EMPLUGINS logs.

Collects EM logs.

Collects ACFS logs and data.

Collects Oracle Installation related files.

Collects CFGTOOLS logs.

Collects operating system files such as

/var/log/ messages

.

Collects Generate ASH HTML Report.

Collects Generate ASH TEXT Report.

Collects AWRHTML logs.

4.5.3.3 Collecting from Specific Directories

To collect from specific directories:

1.

To include all files, no matter the type or time last updated, from other directories in the collection: tfactl diagcollect –collectdir dir1,dir2,...dirn

2.

For example:

To trim and collect all CRS files updated in the last 12 hours as well as all files from

/tmp_dir1

and

/tmp_dir2

at the initiating node:

$ tfactl diagcollect –crs –collectdir /tmp_dir1,/tmpdir_2

To collect from all directories: tfactl diagcollect -collectalldirs

4-28

Chapter 4


Oracle Trace File Analyzer collects from all files in the directory irrespective of time or time range.

For example:

To collect all standard trace and diagnostic files updated in the past day, plus all files from any collectall

directories, no matter when they were updated:

$ tfactl diagcollect -since 1d -collectalldirs

Related Topics:

•

tfactl diagcollect



4.5.3.4 Changing the Collection Name

To change the collection name:

1.

To use your own naming to organize collections:

–tag tagname

The files are collected into

tagname

directory inside the repository.

For example:

$ tfactl diagcollect -since 1h -tag MyTagName


....

....

2.

Logs are being collected to: /scratch/app/crsusr/tfa/repository/MyTagName

/scratch/app/crsusr/tfa/repository/MyTagName/ rws1290666.tfa_Mon_Aug_22_05_26_17_PDT_2016.zip

/scratch/app/crsusr/tfa/repository/MyTagName/ rws1290665.tfa_Mon_Aug_22_05_26_17_PDT_2016.zip

To rename the zip

file:

–z zip name

For example:

$ tfactl diagcollect -since 1h -z MyCollectionName.zip


....

....

Logs are being collected to: /scratch/app/crsusr/tfa/repository/ collection_Mon_Aug_22_05_13_41_PDT_2016_node_all

/scratch/app/crsusr/tfa/repository/ collection_Mon_Aug_22_05_13_41_PDT_2016_node_all/ myserver65.tfa_MyCollectionName.zip

/scratch/app/crsusr/tfa/repository/ collection_Mon_Aug_22_05_13_41_PDT_2016_node_all/ myserver66.tfa_MyCollectionName.zip

Related Topics:

•

tfactl diagcollect



4-29

Chapter 4


4.5.3.5 Preventing Copying Zip Files and Trimming Files

By default, Oracle Trace File Analyzer Collector:

• Copies back all zip files from remote notes to the initiating node

• Trims files around the relevant time

To prevent copying zip files and trimming files:

1.

To prevent copying the zip file back to the initiating node:

–nocopy

2.

For example:

$ tfactl diagcollect -last 1d -nocopy

To avoid trimming files:

–notrim

For example:

$ tfactl diagcollect -last 1d -notrim

4.5.3.6 Performing Silent Collection

1.

To initiate a silent collection:

–silent

The diagcollect

command is submitted as a background process.

For example:

$ tfactl diagcollect -last 1d -silent

4.5.3.7 Preventing Collecting Core Files

1.

To prevent core files being included:

–nocores

For example:

$ tfactl diagcollect -last 1d -nocores

4.5.3.8 Collecting Incident Packaging Service Packages

Oracle Trace File Analyzer is capable of calling the Incident Packaging Service (IPS), which collects files from the Automatic Diagnostic Repository (ADR).

1.

To run Incident Packaging Service:

2.

$ tfactl ips

To collect with Incident Packaging Service:

4-30

Chapter 4


3.

4.

5.

6.

7.

$ tfactl diagcollect -ips

To show all Incident Packaging Service incidents:

$ tfactl ips show incidents

To show all Incident Packaging Service problems:

$ tfactl ips show problems

To show all Incident Packaging Service problems:

$ tfactl ips show package

To see available diagcollect

Incident Packaging Service options:

$ tfactl diagcollect -ips -h

To run Incident Packaging Service collection interactively:

$ tfactl diagcollect -ips

8.

When you run interactively, you are prompted to select the Automatic Diagnostic

Repository area to collect from.

To run Incident Packaging Service collection in silent mode:

9.

$ tfactl diagcollect -ips -adrbasepath path -adrhomepath path

Use the standard diagcollect

options to limit the scope of Incident Packaging

Service collection.

For example, to collect Incident Packaging Service packages for the given

ADR basepath/homepath

in the last hour in the local node:

$ tfactl diagcollect -ips -adrbasepath /scratch/app/oragrid -adrhomepath diag/crs/hostname/crs -since 1h -node local

10.

To collect Automatic Diagnostic Repository details about a specific incident ID:

$ tfactl diagcollect -ips -incident incident id -node local

11.

To collect Automatic Diagnostic Repository details about a specific problem ID:

$ tfactl diagcollect -ips -problem problem id -node local

To change the contents of the Incident Packaging Service package, you can initiate collection, pause it, manipulate the package, and then resume collection.

12.

To collect Automatic Diagnostic Repository details about a specific incident id on the local node and pause for Incident Packaging Service package manipulation:

$ tfactl diagcollect -ips -incident incident id -manageips -node local

13.

To print all paused Oracle Trace File Analyzer Incident Packaging Service collections:

$ tfactl print suspendedips

14.

To resume a suspended Oracle Trace File Analyzer Incident Packaging Service collection:

$ tfactl diagcollect -resumeips collection id

Related Topics:

•

tfactl diagcollect



4-31

Chapter 4


•

tfactl ips

Use the tfactl ips


•

tfactl print



4.6 Analyzing and Searching Recent Log Entries



To analyze and search recent log entries:

1.

To analyze all important recent log entries: tfactl analyze –since n[h|d]

Specify the period of time to analyze in either hours or days.

For example: tfactl analyze –since 14d

2.

The command output shows you a summary of errors found in the logs during the period specified.

To search for all occurrences of a particular message or error code over a specified period of hours or days: tfactl analyze –search “message” –since n[h|d]

For example:

$ tfactl analyze -search "ORA-006" -since 14d

Related Topics:

•


Use the tfactl


•

tfactl analyze



4.7 Managing Oracle Database and Oracle Grid

Infrastructure Diagnostic Data

This section enables you to manage Oracle Database and Oracle Grid Infrastructure diagnostic data and disk usage snapshots.

•


Use the managelogs


•


Use tfactl


4-32

Chapter 4


•


Use these tfactl

commands to manage log file purge policy for Oracle Trace

Analyzer log files.

4.7.1 Managing Automatic Diagnostic Repository Log and Trace Files

Use the managelogs


The

-purge

command option removes files managed by Automatic Diagnostic

Repository. This command clears files from “

ALERT

”, "

INCIDENT

", "

TRACE

", "

CDUMP

", "

HM

",

"

UTSCDMP

", "

LOG

" under diagnostic destinations. The

-purge

command also provides details about the change in the file system space.

If the diagnostic destinations contain large numbers of files, then the command runs for a while. Check the removal of files in progress from the corresponding directories.

To remove files, you must have operating system privileges over the corresponding diagnostic destinations.

To manage Automatic Diagnostic Repository log and trace files:

1.

To limit purge, or show operations to only files older than a specific time:

$ tfactl managelogs -older nm|h|d Files from past 'n' [d]ays or 'n' [h]ours or

'n' [m]inutes

For example:

$ tfactl managelogs -purge -older 30d -dryrun

2.

3.

$ tfactl managelogs -purge -older 30d

To get an estimate of how many files are removed and how much space is freed, use the

–dryrun

option:

For example:

$ tfactl managelogs -purge -older 30d -dryrun

To remove files and clean disk space:

For example:

$ tfactl managelogs -purge -older 30d

$ tfactl managelogs -purge -older 30d –gi

4.

$ tfactl managelogs -purge -older 30d -database

To view the space usage of individual diagnostic destinations:

For example:

$ tfactl managelogs -show usage

$ tfactl managelogs -show usage –gi

$ tfactl managelogs -show usage -database

4-33

Chapter 4

Upgrading Oracle Trace File Analyzer Collector by Applying a Patch Set Update

4.7.2 Managing Disk Usage Snapshots

Use tfactl


Oracle Trace File Analyzer automatically monitors disk usage, records snapshots, and stores the snapshots under tfa/repository/suptools/node/managelogs/usage_snapshot/.

By default, the time interval between snapshots is 60 minutes.

To manage disk usage snapshots:

1.

To change the default time interval for snapshots:

$ tfactl set diskUsageMonInterval=minutes

2.

where

minutes

is the number of minutes between snapshots.

To turn the disk usage monitor on or off:

$ tfactl set diskUsageMon=ON|OFF

4.7.3 Purging Oracle Trace File Analyzer Logs Automatically

Use these tfactl

commands to manage log file purge policy for Oracle Trace Analyzer log files.

Automatic purging is enabled by default on a Domain Service Cluster (DSC), and disabled by default elsewhere. When automatic purging is enabled, every 60 minutes,

Oracle Trace File Analyzer automatically purges logs that are older than 30 days.

To purge Oracle Trace File Analyzer logs automatically:

1.

To turn on or off automatic purging:

2.

3.

$ tfactl set manageLogsAutoPurge=ON|OFF

To adjust the age of logs to purge:

$ tfactl set manageLogsAutoPurgePolicyAge=nd|h

To adjust the frequency of purging:

$ tfactl set manageLogsAutoPurgeInterval=minutes

4.8 Upgrading Oracle Trace File Analyzer Collector by

Applying a Patch Set Update

Always upgrade to the latest version whenever possible to include bug fixes, new features, and optimizations.

Applying the patch set update automatically updates Oracle Trace File Analyzer. The latest version of Oracle Trace File Analyzer is shipped with each new database and

Oracle Grid Infrastructure patch set update. The patch set update version is normally three months behind the version that is released on My Oracle Support.

When a new patch set update is applied to Oracle Grid Infrastructure home or database home, Oracle Trace File Analyzer upgrades automatically if the version in the PSU is greater than the version that is currently installed.

4-34

Chapter 4

Troubleshooting Oracle Trace File Analyzer

The latest Oracle Trace File Analyzer version is available on My Oracle Support Note

1513912.2, three months before it is available in a patch set update.

When updating Oracle Trace File Analyzer through patch set update, Oracle Trace

File Analyzer Database Support Tools Bundle is not updated automatically. Download and update the support tools from My Oracle Support Note 1513912.2.

Related Topics:


4.9 Troubleshooting Oracle Trace File Analyzer

Enable specific trace levels when reproducing a problem to obtain sufficient diagnostics.

To quickly enable or disable the correct trace levels, use the dbglevel

option.

All the required trace level settings are organized into problem-specific trace profiles.

To set trace levels:

1.

To set a trace profile: tfactl dgblevel –set profile

4-35

5

Proactively Detecting and Diagnosing

Performance Issues for Oracle RAC

Oracle Cluster Health Advisor provides system and database administrators with early warning of pending performance issues, and root causes and corrective actions for

Oracle RAC databases and cluster nodes.

Use Oracle Cluster Health Advisor to increase availability and performance management.

Oracle Cluster Health Advisor estimates an expected value of an observed input is estimated based on the default model, which is a trained calibrated model based on a normal operational period of the target system. Oracle Cluster Health Advisor then performs anomaly detection for each input based on the difference between observed and expected values. If sufficient inputs associated with a specific problem are abnormal, then Oracle Cluster Health Advisor raises a warning and generates an immediate targeted diagnosis and corrective action.

Oracle Cluster Health Advisor stores the analysis results, along with diagnosis information, corrective action, and metric evidence for later triage, in the Grid

Infrastructure Management Repository (GIMR). Oracle Cluster Health Advisor also sends warning messages to Enterprise Manager Cloud Control using the Oracle

Clusterware event notification protocol.

Topics:

•


Oracle Cluster Health Advisor runs as a highly available cluster resource, ochad

, on each node in the cluster.

•

Monitoring the Oracle Real Application Clusters (Oracle RAC) Environment with


Oracle Cluster Health Advisor is automatically provisioned on each node by default when Oracle Grid Infrastructure is installed for Oracle Real Application

Clusters (Oracle RAC) or Oracle RAC One Node database.

•


Oracle Cluster Health Advisor raises and clears problems autonomously and stores the history in the Grid Infrastructure Management Repository (GIMR).

•


As shipped with default node and database models, Oracle Cluster Health Advisor is designed not to generate false warning notifications.

•


Use the chactl query model

command to view the model details.

•


Oracle Cluster Health Advisor repository stores the historical records of cluster host problems, database problems, and associated metric evidence, along with models.

5-1

Chapter 5


•


SRVCTL commands are the tools that offer total control on managing the life cycle of Oracle Cluster Health Advisor as a highly available service.

Related Topics:

•


Oracle Cluster Health Advisor continuously monitors cluster nodes and Oracle

RAC databases for performance and availability issue precursors to provide early warning of problems before they become critical.

5.1 Oracle Cluster Health Advisor Architecture

Oracle Cluster Health Advisor runs as a highly available cluster resource, ochad

, on each node in the cluster.

Each Oracle Cluster Health Advisor daemon ( ochad

) monitors the operating system on the cluster node and optionally, each Oracle Real Application Clusters (Oracle RAC) database instance on the node.

Figure 5-1 Oracle Cluster Health Advisor Architecture

The ochad

daemon receives operating system metric data from the Cluster Health

Monitor and gets Oracle RAC database instance metrics from a memory-mapped file.

The daemon does not require a connection to each database instance. This data, along with the selected model, is used in the Health Prognostics Engine of Oracle

Cluster Health Advisor for both the node and each monitored database instance in order to analyze their health multiple times a minute.

5-2

Chapter 5

Monitoring the Oracle Real Application Clusters (Oracle RAC) Environment with Oracle Cluster Health Advisor

5.2 Monitoring the Oracle Real Application Clusters (Oracle

RAC) Environment with Oracle Cluster Health Advisor

Oracle Cluster Health Advisor is automatically provisioned on each node by default when Oracle Grid Infrastructure is installed for Oracle Real Application Clusters

(Oracle RAC) or Oracle RAC One Node database.

Oracle Cluster Health Advisor does not require any additional configuration. The credentials of OCHAD daemon user in the Grid Infrastructure Management Repository

(GIMR), are securely and randomly generated and stored in the Oracle Grid

Infrastructure Credential Store.

When Oracle Cluster Health Advisor detects an Oracle Real Application Clusters

(Oracle RAC) or Oracle RAC One Node database instance as running, Oracle Cluster

Health Advisor autonomously starts monitoring the cluster nodes. Use CHACTL while logged in as the Grid user to turn on monitoring of the database.

To monitor the Oracle Real Application Clusters (Oracle RAC) environment:

1.

To monitor a database, run the following command:

$ chactl monitor database –db db_unique_name

2.

Oracle Cluster Health Advisor monitors all instances of the Oracle Real Application

Clusters (Oracle RAC) or Oracle RAC One Node database using the default model. Oracle Cluster Health Advisor cannot monitor single-instance Oracle databases, even if the single-instance Oracle databases share the same cluster as

Oracle Real Application Clusters (Oracle RAC) databases.

Oracle Cluster Health Advisor preserves database monitoring status across cluster restarts as Oracle Cluster Health Advisor stores the status information in the

GIMR. Each database instance is monitored independently both across Oracle

Real Application Clusters (Oracle RAC) database nodes and when more than one database run on a single node.

To stop monitoring a database, run the following command:

$ chactl unmonitor database –db db_unique_name

3.

Oracle Cluster Health Advisor stops monitoring all instances of the specified database. However, Oracle Cluster Health Advisor does not delete any data or problems until it is aged out beyond the retention period.

To check monitoring status of all cluster nodes and databases, run the following command:

$ chactl status

Use the

–verbose

option to see more details, such as the models used for the nodes and each database.

5-3

Chapter 5


5.3 Using Cluster Health Advisor for Health Diagnosis

Oracle Cluster Health Advisor raises and clears problems autonomously and stores the history in the Grid Infrastructure Management Repository (GIMR).

The Oracle Grid Infrastructure user can query the stored information using CHACTL.

To query the diagnostic data:

1.

To query currently open problems, run the following command: chactl query diagnosis -db db_unique_name -start time -end time

In the syntax example, db_unique_name is the name of your database instance.

You also specify the start time and end time for which you want to retrieve data.

Specify date and time in the

YYYY-MM-DD HH24:MI:SS

format.

2.

Use the

-htmlfile file_name

option to save the output in HTML format.

Example 5-1 Cluster Health Advisor Output Examples in Text and HTML

Format

This example shows the default text output for the chactl query diagnosis

command for a database named

oltpacbd

.

$ chactl query diagnosis -db oltpacdb -start "2016-02-01 02:52:50" -end "2016-02-01

03:19:15"

2016-02-01 01:47:10.0 Database oltpacdb DB Control File IO Performance

(oltpacdb_1) [detected]



2016-02-01 02:52:15.0 Database oltpacdb DB CPU Utilization (oltpacdb_2) [detected]


2016-02-01 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]


Problem: DB Control File IO Performance

Description: CHA has detected that reads or writes to the control files are slower than expected.

Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO.

The slow control file reads and writes may have an impact on checkpoint and Log

Writer (LGWR) performance.

Action: Separate the control files from other database files and move them to faster disks or Solid State Devices.

Problem: DB CPU Utilization

Description: CHA detected larger than expected CPU utilization for this database.

Cause: The Cluster Health Advisor (CHA) detected an increase in database CPU utilization because of an increase in the database workload.

Action: Identify the CPU intensive queries by using the Automatic Diagnostic and

Defect Manager (ADDM) and follow the recommendations given there. Limit the number of CPU intensive queries or relocate sessions to less busy machines. Add CPUs if the CPU capacity is insufficent to support the load without a performance degradation or effects on other databases.

Problem: DB Log File Switch

5-4

Chapter 5


Description: CHA detected that database sessions are waiting longer than expected for log switch completions.

Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently.

Action: Increase the size of the redo logs.

The timestamp displays date and time when the problem was detected on a specific host or database.

Note:

The same problem can occur on different hosts and at different times, yet the diagnosis shows complete details of the problem and its potential impact.

Each problem also shows targeted corrective or preventive actions.

Here is an example of what the output looks like in the HTML format.

$ chactl query diagnosis -start "2016-07-03 20:50:00" -end "2016-07-04 03:50:00" htmlfile ~/chaprob.html

Figure 5-2 Cluster Health Advisor Diagnosis HTML Output

Related Topics:

•


Use the chactl query diagnosis

command to return problems and diagnosis, and suggested corrective actions associated with the problem for specific cluster nodes or Oracle Real Application Clusters (Oracle RAC) databases.

5-5

Chapter 5


5.4 Calibrating an Oracle Cluster Health Advisor Model for a

Cluster Deployment

As shipped with default node and database models, Oracle Cluster Health Advisor is designed not to generate false warning notifications.

You can increase the sensitivity and accuracy of the Oracle Cluster Health Advisor models for a specific workload using the chactl calibrate

command.

Oracle recommends that a minimum of 6 hours of data be available and that both the cluster and databases use the same time range for calibration.

The chactl calibrate

command analyzes a user-specified time interval that includes all workload phases operating normally. This data is collected while Oracle Cluster

Health Advisor is monitoring the cluster and all the databases for which you want to calibrate.

1.

To check if sufficient data is available, run the query calibration

command.

If 720 or more records are available, then Oracle Cluster Health Advisor successfully performs the calibration. The calibration function may not consider some data records to be normally occurring for the workload profile being used. In this case, filter the data by using the

KPISET

parameters in both the query calibration

command and the calibrate

command.

For example:

2.

$ chactl query calibration -db oltpacdb -timeranges

'start=2016-07-26 01:00:00,end=2016-07-26 02:00:00,start=2016-07-26

03:00:00,end=2016-07-26 04:00:00'

-kpiset 'name=CPUPERCENT min=20 max=40, name=IOTHROUGHPUT min=500 max=9000' interval 2

Start the calibration and store the model under a user-specified name for the specified date and time range.

For example:

$ chactl calibrate cluster –model weekday –timeranges ‘start=2016-07-03

20:50:00,end=2016-07-04 15:00:00’

3.

After completing the calibration, Oracle Cluster Health Advisor automatically stores the new model in GIMR.

Use the new model to monitor the cluster as follows:

For example:

$ chactl monitor cluster –model weekday

Example 5-2 Output for the chactl query calibrate command

Database name : oltpacdb

Start time : 2016-07-26 01:03:10

End time : 2016-07-26 01:57:25

Total Samples : 120

Percentage of filtered data : 8.32%

The number of data samples may not be sufficient for calibration.

1) Disk read (ASM) (Mbyte/sec)

5-6

Chapter 5


MEAN MEDIAN STDDEV MIN MAX

4.96 0.20 8.98 0.06 25.68

<25 <50 <75 <100 >=100

97.50% 2.50% 0.00% 0.00% 0.00%

2) Disk write (ASM) (Mbyte/sec)


27.73 9.72 31.75 4.16 109.39

<50 <100 <150 <200 >=200

73.33% 22.50% 4.17% 0.00% 0.00%

3) Disk throughput (ASM) (IO/sec)


2407.50 1500.00 1978.55 700.00 7800.00

<5000 <10000 <15000 <20000 >=20000

83.33% 16.67% 0.00% 0.00% 0.00%

4) CPU utilization (total) (%)


21.99 21.75 1.36 20.00 26.80

<20 <40 <60 <80 >=80

0.00% 100.00% 0.00% 0.00% 0.00%

5) Database time per user call (usec/call)


267.39 264.87 32.05 205.80 484.57

<10000000 <20000000 <30000000 <40000000 <50000000 <60000000 <70000000

>=70000000

100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%


Start time : 2016-07-26 03:00:00

End time : 2016-07-26 03:53:30

Total Samples : 342





12.18 0.28 16.07 0.05 60.98

<25 <50 <75 <100 >=100

64.33% 34.50% 1.17% 0.00% 0.00%



57.57 51.14 34.12 16.10 135.29

<50 <100 <150 <200 >=200

5-7

Chapter 5


49.12% 38.30% 12.57% 0.00% 0.00%



5048.83 4300.00 1730.17 2700.00 9000.00

<5000 <10000 <15000 <20000 >=20000

63.74% 36.26% 0.00% 0.00% 0.00%



23.10 22.80 1.88 20.00 31.40

<20 <40 <60 <80 >=80

0.00% 100.00% 0.00% 0.00% 0.00%



744.39 256.47 2892.71 211.45 45438.35

<10000000 <20000000 <30000000 <40000000 <50000000 <60000000 <70000000

>=70000000

100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

Related Topics:

•

chactl calibrate

Use the chactl calibrate

command to create a new model that has greater sensitivity and accuracy.

•


Use the chactl query calibration

command to view detailed information about the calibration data of a specific target.

•


The Oracle Cluster Health Advisor commands enable the Oracle Grid

Infrastructure user to administer basic monitoring functionality on the targets.

5.5 Viewing the Details for an Oracle Cluster Health Advisor

Model


command to view the model details.

1.

You can review the details of an Oracle Cluster Health Advisor model at any time using the chactl query model

command.

For example:

$ chactl query model –name weekday

Model: weekday

Target Type: CLUSTERWARE

Version: OS12.2_V14_0.9.8

OS Calibrated on: Linux amd64

Calibration Target Name: MYCLUSTER

Calibration Date: 2016-07-05 01:13:49

5-8

Chapter 5


Calibration Time Ranges: start=2016-07-03 20:50:00,end=2016-07-04 15:00:00

Calibration KPIs: not specified

You can also rename, import, export, and delete the models.

5.6 Managing the Oracle Cluster Health Advisor Repository

Oracle Cluster Health Advisor repository stores the historical records of cluster host problems, database problems, and associated metric evidence, along with models.

The Oracle Cluster Health Advisor repository is used to diagnose and triage periodic problems. By default, the repository is sized to retain data for 16 targets (nodes and database instances) for 72 hours. If the number of targets increase, then the retention time is automatically decreased. Oracle Cluster Health Advisor generates warning messages when the retention time goes below 72 hours, and stops monitoring and generates a critical alert when the retention time goes below 24 hours.

Use CHACTL commands to manage the repository and set the maximum retention time.

1.

To retrieve the repository details, use the following command:

$ chactl query repository

2.

For example, running the command mentioned earlier shows the following output: specified max retention time(hrs) : 72 available retention time(hrs) : 212 available number of entities : 2 allocated number of entities : 0 total repository size(gb) : 2.00

allocated repository size(gb) : 0.07

To set the maximum retention time in hours, based on the current number of targets being monitored, use the following command:

$ chactl set maxretention -time number_of_hours

For example:

$ chactl set maxretention -time 80 max retention successfully set to 80 hours

Note:

The maxretention

setting limits the oldest data retained in the repository, but is not guaranteed to be maintained if the number of monitored targets increase.

In this case, if the combination of monitored targets and number of hours are not sufficient, then increase the size of the Oracle Cluster Health Advisor repository.

3.

To increase the size of the Oracle Cluster Health Advisor repository, use the chactl resize repository

command.

For example, to resize the repository to support 32 targets using the currently set maximum retention time, you would use the following command:

5-9

Chapter 5


$ chactl resize repository –entities 32 repository successfully resized for 32 targets

5.7 Viewing the Status of Cluster Health Advisor

SRVCTL commands are the tools that offer total control on managing the life cycle of

Oracle Cluster Health Advisor as a highly available service.

Use SRVCTL commands to the check the status and configuration of Oracle Cluster

Health Advisor service on any active hub or leaf nodes of the Oracle RAC cluster.

Note:

A target is monitored only if it is running and the Oracle Cluster Health Advisor service is also running on the host node where the target exists.

1.

To check the status of Oracle Cluster Health Advisor service on all nodes in the

Oracle RAC cluster: srvctl status cha [-help]

2.

For example:

# srvctl status cha

Cluster Health Advisor is running on nodes racNode1, racNode2.

Cluster Health Advisor is not running on nodes racNode3, racNode4.

To check if Oracle Cluster Health Advisor service is enabled or disabled on all nodes in the Oracle RAC cluster: srvctl config cha [-help]

For example:

# srvctl config cha

Cluster Health Advisor is enabled on nodes racNode1, racNode2.

Cluster Health Advisor is not enabled on nodes racNode3, racNode4.

5-10

6


Memory Guard continuously monitors and ensures the availability of cluster nodes by preventing the nodes from being evicted when the nodes are stressed due to lack of memory.

Topics:

•

Overview of Memory Guard

Memory Guard automatically monitors cluster nodes to prevent node stress caused by the lack of memory.

•


Memory Guard is implemented as a daemon running as an MBean in a J2EE container managed by Cluster Ready Services (CRS).

•

Enabling Memory Guard in Oracle Real Application Clusters (Oracle RAC)

Environment

Memory Guard is automatically enabled when you install Oracle Grid Infrastructure for an Oracle Real Application Clusters (Oracle RAC) or an Oracle RAC One Node database.

•


Deployment

Memory Guard autonomously detects and monitors Oracle Real Application

Clusters (Oracle RAC) or Oracle RAC One Node databases when they are open.

Related Topics:

•



6.1 Overview of Memory Guard

Memory Guard automatically monitors cluster nodes to prevent node stress caused by the lack of memory.

Memory Guard autonomously collects metrics on memory usage for every node in an

Oracle Real Application Clusters (Oracle RAC) environment. Memory Guard gets the information from Cluster Health Monitor. If Memory Guard determines that a node has insufficient memory, then Memory Guard performs the following actions:

• Prevents new database sessions from being created on the afflicted node

• Stops all CRS-managed services transactionally on the node, allowing the existing workload on the node to complete and free their memory

When Memory Guard determines that the memory stress has been relieved, it restores connectivity to the node, allowing new sessions to be created on that node.

Running out of memory can result in failed transactions or, in extreme cases, a restart of the node resulting in the loss of availability and resources for your applications.

6-1

Chapter 6


6.2 Memory Guard Architecture

Memory Guard is implemented as a daemon running as an MBean in a J2EE container managed by Cluster Ready Services (CRS).

Memory Guard is hosted on the qosmserver

resource that runs on any cluster node for high availability.

Figure 6-1 Memory Guard Architecture

6-2

Chapter 6

Enabling Memory Guard in Oracle Real Application Clusters (Oracle RAC) Environment

Cluster Health Monitor sends a metrics stream to Memory Guard that provides realtime information about memory resources for the cluster nodes. This information includes the following:

• Amount of available memory

• Amount of memory currently in use

After getting memory resource information, Memory Guard collects the cluster topology from Oracle Clusterware. Memory Guard uses cluster topology and memory metrics to identify database nodes that have memory stress. Memory is considered stressed when the free memory is less than a certain threshold.

Memory Guard then stops the database services managed by Oracle Clusterware on the stressed node transactionally. Memory Guard relieves the memory stress without affecting already running sessions and their associated transactions. After completion, the memory used by these processes starts freeing up and adding to the pool of the available memory on the node. When Memory Guard detects that the amount of available memory is more than the threshold, it restarts the services on the affected node.

While a service is stopped on a stressed node, the listener redirects new connections for that service to other nodes that provide the same service for non-singleton database instances. However, for the policy-managed databases, the last instance of a service is not stopped to ensure availability.

Note:

Memory Guard can start or stop the services for databases in the Open state.

Memory Guard does not manage the default database service and does not act while upgrading or downgrading a database.

6.3 Enabling Memory Guard in Oracle Real Application

Clusters (Oracle RAC) Environment

Memory Guard is automatically enabled when you install Oracle Grid Infrastructure for an Oracle Real Application Clusters (Oracle RAC) or an Oracle RAC One Node database.

Run the srvctl

command to query the status of Memory Guard as follows: srvctl status qosmserver

Example 6-1 Verifying that Memory Guard is Running on a Node

The following example shows sample output of the status of Memory Guard on qosmserver

.

$ srvctl status qosmserver

QoS Management Server is enabled.

QoS Management Server is running on node nodeABC

6-3

Chapter 6

Use of Memory Guard in Oracle Real Application Clusters (Oracle RAC) Deployment

6.4 Use of Memory Guard in Oracle Real Application

Clusters (Oracle RAC) Deployment

Memory Guard autonomously detects and monitors Oracle Real Application Clusters

(Oracle RAC) or Oracle RAC One Node databases when they are open.

Memory Guard sends alert notifications when Memory Guard detects memory stress on a database node. You can find Memory Guard alerts in audit logs at

$ORACLE_BASE/ crsdata/node name/qos/logs/dbwlm/auditing

.

Example 6-2 Memory Guard Alert Notifications

The following example shows a Memory Guard log file when the services were stopped due to memory stress.

<MESSAGE>

<HEADER>

<TSTZ_ORIGINATING>2016-07-28T16:11:03.701Z</TSTZ_ORIGINATING>

<COMPONENT_ID>wlm</COMPONENT_ID>

<MSG_TYPE TYPE="NOTIFICATION"></MSG_TYPE>

<MSG_LEVEL>1</MSG_LEVEL>

<HOST_ID>hostABC</HOST_ID>

<HOST_NWADDR>11.111.1.111</HOST_NWADDR>

<MODULE_ID>gomlogger</MODULE_ID>

<THREAD_ID>26</THREAD_ID>

<USER_ID>userABC</USER_ID>

<SUPPL_ATTRS>

<ATTR NAME="DBWLM_OPERATION_USER_ID">userABC</ATTR>

<ATTR NAME="DBWLM_THREAD_NAME">MPA Task Thread 1469722257648</ATTR>

</SUPPL_ATTRS>

</HEADER>

<PAYLOAD>

<MSG_TEXT>Server Pool Generic has violation risk level RED.</MSG_TEXT>

</PAYLOAD>

</MESSAGE>

<MESSAGE>

<HEADER>










<SUPPL_ATTRS>



</SUPPL_ATTRS>

</HEADER>

<PAYLOAD>

MSG_TEXT>Server userABC-hostABC-0 has violation risk level RED. New connection requests will no longer be accepted.</MSG_TEXT>

</PAYLOAD>

</MESSAGE>

6-4

Chapter 6

Use of Memory Guard in Oracle Real Application Clusters (Oracle RAC) Deployment

The following example shows a Memory Guard log file when the services were restarted after relieving the memory stress.

<MESSAGE>

<HEADER>










<SUPPL_ATTRS>



</SUPPL_ATTRS>

</HEADER>

<PAYLOAD>

<MSG_TEXT>Memory pressure in Server Pool Generic has returned to normal.</MSG_TEXT>

</PAYLOAD>

</MESSAGE>

<MESSAGE>

<HEADER>










<SUPPL_ATTRS>



</SUPPL_ATTRS>

</HEADER>

<PAYLOAD>

<MSG_TEXT>Memory pressure in server userABC-hostABC-0 has returned to normal. New connection requests are now accepted.</MSG_TEXT>

</PAYLOAD>

</MESSAGE>

<MESSAGE>

6-5

7

Resolving Database and Database

Instance Hangs

Hang Manager preserves the database performance by resolving hangs and keeping the resources available.

Topics:

•


Hang Manager autonomously runs as a

DIA0

task within the database.

•


You can adjust the sensitivity, and control the size and number of the log files used by Hang Manager.

•


Hang Manager autonomously resolves hangs and continuously logs the resolutions in the database alert logs and the diagnostics in the trace files.

Related Topics:

•



7.1 Hang Manager Architecture

Hang Manager autonomously runs as a

DIA0

task within the database.

7-1

Figure 7-1 Hang Manager Architecture

Chapter 7


Hang Manager works in the following three phases:

• Detect: In this phase, Hang Manager collects the data on all the nodes and detects the sessions that are waiting for the resources held by another session.

• Analyze: In this phase, Hang Manager analyzes the sessions detected in the

Detect phase to determine if the sessions are part of a potential hang. If the sessions are suspected as hung, Hang Manager then waits for a certain threshold time period to ensure that the sessions are hung.

7-2

Chapter 7


• Verify: In this phase, after the threshold time period is up, Hang Manager verifies that the sessions are hung and selects a victim session. The victim session is the session that is causing the hang.

After the victim session is selected, Hang Manager applies hang resolution methods on the victim session. If the chain of sessions or the hang resolves automatically, then

Hang Manager does not apply hang resolution methods. However, if the hang does not resolve by itself, then Hang Manager resolves the hang by terminating the victim session. If terminating the session fails, then Hang Manager terminates the process of the session. This entire process is autonomous and does not block resources for a long period and does not affect the performance.

Hang Manager also considers Oracle Database QoS Management policies, performance classes, and ranks that you use to maintain performance objectives.

For example, if a high rank session is included in the chain of hung sessions, then

Hang Manager expedites the termination of the victim session. Termination of the victim session prevents the high rank session from waiting too long and helps to maintain performance objective of the high rank session.

7.2 Optional Configuration for Hang Manager

You can adjust the sensitivity, and control the size and number of the log files used by

Hang Manager.

Sensitivity

If Hang Manager detects a hang, then Hang Manager waits for a certain threshold time period to ensure that the sessions are hung. Change threshold time period by using

DBMS_HANG_MANAGER

to set the sensitivity

parameter to either

Normal

or

High

. If the sensitivity

parameter is set to

Normal

, then Hang Manager waits for the default time period. However, if the sensitivity is set to

High

, then the time period is reduced by

50%.

By default, the sensitivity

parameter is set to

Normal

. To set Hang Manager sensitivity, run the following commands in SQL*Plus as

SYS

user:

• To set the sensitivity

parameter to

Normal

: exec dbms_hang_manager.set(dbms_hang_manager.sensitivity, dbms_hang_manager.sensitivity_normal);

• To set the sensitivity

parameter to

High

: exec dbms_hang_manager.set(dbms_hang_manager.sensitivity, dbms_hang_manager.sensitivity_high);

Size of the Trace Log File

The Hang Manager logs detailed diagnostics of the hangs in the trace files with

_base_ in the file name. Change the size of the trace files in bytes with the base_file_size_limit

parameter. Run the following command in SQL*Plus, for example, to set the trace file size limit to 100 MB: exec dbms_hang_manager.set(dbms_hang_manager.base_file_size_limit, 104857600);

7-3

Chapter 7


Number of Trace Log Files

The base Hang Manager trace files are part of a trace file set. Change the number of trace files in trace file set with the base_file_set_count

parameter. Run the following command in SQL*Plus, for example, to set the number of trace files in trace file set to

6: exec dbms_hang_manager.set(dbms_hang_manager.base_file_set_count,6);

By default, base_file_set_count

parameter is set to 5.

7.3 Hang Manager Diagnostics and Logging

Hang Manager autonomously resolves hangs and continuously logs the resolutions in the database alert logs and the diagnostics in the trace files.

Hang Manager logs the resolutions in the database alert logs as Automatic Diagnostic

Repository (ADR) incidents with incident code

ORA–32701

.

You also get detailed diagnostics about the hang detection in the trace files. Trace files and alert logs have file names starting with

database instance_dia0_

.

• The trace files are stored in the

$ ADR_BASE/diag/rdbms/database name/

database instance/incident/incdir_xxxxxx

directory

• The alert logs are stored in the

$ ADR_BASE/diag/rdbms/database name/

database instance/trace

directory

Example 7-1 Hang Manager Trace File for a Local Instance

This example shows an example of the output you see for Hang Manager for the local database instance

Trace Log File .../oracle/log/diag/rdbms/hm1/hm11/incident/incdir_111/ hm11_dia0_11111_i111.trc

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

...

*** 2016-07-16T12:39:02.715475-07:00

HM: Hang Statistics - only statistics with non-zero values are listed

current number of active sessions 3

current number of hung sessions 1

instance health (in terms of hung sessions) 66.67%

number of cluster-wide active sessions 9

number of cluster-wide hung sessions 5

cluster health (in terms of hung sessions) 44.45%

*** 2016-07-16T12:39:02.715681-07:00

Resolvable Hangs in the System

Root Chain Total Hang

Hang Hang Inst Root #hung #hung Hang Hang Resolution

ID Type Status Num Sess Sess Sess Conf Span Action

----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

1 HANG RSLNPEND 3 44 3 5 HIGH GLOBAL Terminate Process

Hang Resolution Reason: Although hangs of this root type are typically

self-resolving, the previously ignored hang was automatically resolved.

kjznshngtbldmp: Hang's QoS Policy and Multiplier Checksum 0x0

7-4

Chapter 7


Inst Sess Ser Proc Wait

Num ID Num OSPID Name Event

----- ------ ----- --------- ----- -----

1 111 1234 34567 FG gc buffer busy acquire

1 22 12345 34568 FG gc current request

3 44 23456 34569 FG not in wait

Example 7-2 Error Message in the Alert Log Indicating a Hung Session

This example shows an example of a Hang Manager alert log on the master instance

2016-07-16T12:39:02.616573-07:00

Errors in file .../oracle/log/diag/rdbms/hm1/hm1/trace/hm1_dia0_i1111.trc

(incident=1111):

ORA-32701: Possible hangs up to hang ID=1 detected

Incident details in: .../oracle/log/diag/rdbms/hm1/hm1/incident/incdir_1111/ hm1_dia0_11111_i1111.trc

2016-07-16T12:39:02.674061-07:00

DIA0 requesting termination of session sid:44 with serial # 23456 (ospid:34569) on instance 3

due to a GLOBAL, HIGH confidence hang with ID=1.



DIA0: Examine the alert log on instance 3 for session termination status of hang with ID=1.

Example 7-3 Error Message in the Alert Log Showing a Session Hang Resolved by Hang Manager

This example shows an example of a Hang Manager alert log on the local instance for resolved hangs

2016-07-16T12:39:02.707822-07:00

Errors in file .../oracle/log/diag/rdbms/hm1/hm11/trace/hm11_dia0_11111.trc

(incident=169):

ORA-32701: Possible hangs up to hang ID=1 detected

Incident details in: .../oracle/log/diag/rdbms/hm1/hm11/incident/incdir_169/ hm11_dia0_30676_i169.trc

2016-07-16T12:39:05.086593-07:00

DIA0 terminating blocker (ospid: 30872 sid: 44 ser#: 23456) of hang with ID = 1

requested by master DIA0 process on instance 1



by terminating session sid:44 with serial # 23456 (ospid:34569)

...

DIA0 successfully terminated session sid:44 with serial # 23456 (ospid:34569) with status 0.

7-5

8

Monitoring System Metrics for Cluster

Nodes

This chapter explains the methods to monitor Oracle Clusterware.

Oracle recommends that you use Oracle Enterprise Manager to monitor everyday operations of Oracle Clusterware.

Cluster Health Monitor monitors the complete technology stack, including the operating system, ensuring smooth cluster operations. Both the components are enabled, by default, for any Oracle cluster. Oracle strongly recommends that you use both the components. Also, monitor Oracle Clusterware-managed resources using the

Clusterware resource activity log.

Topics:

•


Use Oracle Enterprise Manager to monitor the Oracle Clusterware environment.

•

Monitoring Oracle Clusterware with Cluster Health Monitor

You can use the OCLUMON command-line tool to interact with Cluster Health

Monitor.

•

Using the Cluster Resource Activity Log to Monitor Cluster Resource Failures

The cluster resource activity log provides precise and specific information about a resource failure, separate from diagnostic logs.

Related Topics:

•

Managing the Cluster Resource Activity Log

Oracle Clusterware stores logs about resource failures in the cluster resource activity log, which is located in the Grid Infrastructure Management Repository.

8.1 Monitoring Oracle Clusterware with Oracle Enterprise

Manager

Use Oracle Enterprise Manager to monitor the Oracle Clusterware environment.

When you log in to Oracle Enterprise Manager using a client browser, the Cluster

Database Home page appears where you can monitor the status of both Oracle

Database and Oracle Clusterware environments. Oracle Clusterware monitoring includes the following details:

• Current and historical Cluster Health Monitor data in Oracle Enterprise Manager on the cluster target

• Notifications if there are any VIP relocations

• Status of the Oracle Clusterware on each node of the cluster using information obtained through the Cluster Verification Utility (CVU)

• Notifications if node applications ( nodeapps

) start or stop

8-1

Chapter 8


• Notification of issues in the Oracle Clusterware alert log for the Oracle Cluster

Registry, voting file issues (if any), and node evictions

The Cluster Database Home page is similar to a single-instance Database Home page. However, on the Cluster Database Home page, Oracle Enterprise Manager displays the system state and availability. The system state and availability includes a summary about alert messages and job activity, and links to all the database and

Oracle Automatic Storage Management (Oracle ASM) instances. For example, track problems with services on the cluster including when a service is not running on all the preferred instances or when a service response time threshold is not being met.

Use the Oracle Enterprise Manager Interconnects page to monitor the Oracle

Clusterware environment. The Interconnects page displays the following details:

• Public and private interfaces on the cluster

• Overall throughput on the private interconnect

• Individual throughput on each of the network interfaces

• Error rates (if any)

• Load contributed by database instances on the interconnect

• Notifications if a database instance is using public interface due to misconfiguration

• Throughput contributed by individual instances on the interconnect

All the information listed earlier is also available as collections that have a historic view. The historic view is useful with cluster cache coherency, such as when diagnosing problems related to cluster wait events. Access the Interconnects page by clicking the Interconnect tab on the Cluster Database home page.

Also, the Oracle Enterprise Manager Cluster Database Performance page provides a quick glimpse of the performance statistics for a database. Statistics are rolled up across all the instances in the cluster database in charts. Using the links next to the charts, you can get more specific information and perform any of the following tasks:

• Identify the causes of performance issues

• Decide whether resources must be added or redistributed

• Tune your SQL plan and schema for better optimization

• Resolve performance issues

The charts on the Cluster Database Performance page include the following:

• Chart for Cluster Host Load Average: The Cluster Host Load Average chart in the Cluster Database Performance page shows potential problems that are outside the database. The chart shows maximum, average, and minimum load values for available nodes in the cluster for the previous hour.

• Chart for Global Cache Block Access Latency: Each cluster database instance has its own buffer cache in its System Global Area (SGA). Using Cache Fusion,

Oracle RAC environments logically combine buffer cache of each instance to enable the database instances to process data as if the data resided on a logically combined, single cache.

• Chart for Average Active Sessions: The Average Active Sessions chart in the

Cluster Database Performance page shows potential problems inside the database. Categories, called wait classes, show how much of the database is using a resource, such as CPU or disk I/O. Comparing CPU time to wait time

8-2

Chapter 8

Monitoring Oracle Clusterware with Cluster Health Monitor helps to determine how much of the response time is consumed with useful work rather than waiting for resources that are potentially held by other processes.

• Chart for Database Throughput: The Database Throughput charts summarize any resource contention that appears in the Average Active Sessions chart, and also show how much work the database is performing on behalf of the users or applications. The Per Second view shows the number of transactions compared to the number of logons, and the amount of physical reads compared to the redo size for each second. The Per Transaction view shows the amount of physical reads compared to the redo size for each transaction. Logons is the number of users that are logged on to the database.

In addition, the Top Activity drop-down menu on the Cluster Database Performance page enables you to see the activity by wait events, services, and instances. In addition, you can see the details about SQL/sessions by going to a prior point in time by moving the slider on the chart.

Related Topics:

• Oracle Database 2 Day + Real Application Clusters Guide

8.2 Monitoring Oracle Clusterware with Cluster Health

Monitor

You can use the OCLUMON command-line tool to interact with Cluster Health Monitor.

OCLUMON is included with Cluster Health Monitor. You can use it to query the Cluster

Health Monitor repository to display node-specific metrics for a specified time period.

You can also use OCLUMON to perform miscellaneous administrative tasks, such as the following:

• Changing the debug levels with the oclumon debug

command

• Querying the version of Cluster Health Monitor with the oclumon version

command

• Viewing the collected information in the form of a node view using the oclumon dumpnodeview

command

• Changing the metrics database size using the oclumon manage

command

Related Topics:

•



8.3 Using the Cluster Resource Activity Log to Monitor

Cluster Resource Failures

The cluster resource activity log provides precise and specific information about a resource failure, separate from diagnostic logs.

If an Oracle Clusterware-managed resource fails, then Oracle Clusterware logs messages about the failure in the cluster resource activity log located in the Grid

Infrastructure Management Repository. Failures can occur as a result of a problem

8-3

Chapter 8

Using the Cluster Resource Activity Log to Monitor Cluster Resource Failures with a resource, a hosting node, or the network. The cluster resource activity log provides a unified view of the cause of resource failure.

Writes to the cluster resource activity log are tagged with an activity ID and any related data gets the same parent activity ID, and is nested under the parent data. For example, if Oracle Clusterware is running and you run the crsctl stop clusterware all

command, then all activities get activity IDs, and related activities are tagged with the same parent activity ID. On each node, the command creates sub-IDs under the parent IDs, and tags each of the respective activities with their corresponding activity

ID. Further, each resource on the individual nodes creates sub-IDs based on the parent ID, creating a hierarchy of activity IDs. The hierarchy of activity IDs enables you to analyze the data to find specific activities.

For example, you may have many resources with complicated dependencies among each other, and with a database service. On Friday, you see that all of the resources are running on one node but when you return on Monday, every resource is on a different node, and you want to know why. Using the crsctl query calog

command, you can query the cluster resource activity log for all activities involving those resources and the database service. The output provides a complete flow and you can query each sub-ID within the parent service failover ID, and see, specifically, what happened and why.

You can query any number of fields in the cluster resource activity log using filters. For example, you can query all the activities written by specific operating system users such as root

. The output produced by the crsctl query calog

command can be displayed in either a tabular format or in XML format.

The cluster resource activity log is an adjunct to current Oracle Clusterware logging and alert log messages.

Note:

Oracle Clusterware does not write messages that contain security-related information, such as log-in credentials, to the cluster activity log.

Use the following commands to manage and view the contents of the cluster resource activity log:

8-4

9

Monitoring and Managing Database

Workload Performance

Oracle Database Quality of Service (QoS) Management is an automated, policy-based product that monitors the workload requests for an entire system.

This chapter contains the following sections:

Topics:

•

What Does Oracle Database Quality of Service (QoS) Management Manage?

Oracle Database Quality of Service (QoS) Management works with Oracle Real

Application Clusters (Oracle RAC) and Oracle Clusterware. Oracle Database QoS

Management operates over an entire Oracle RAC cluster, which can support various applications.

•


Oracle Database Quality of Service (QoS) Management uses a resource management plan and user-specific performance objectives to allocate resources to defined workloads.

•

Overview of Metrics

Oracle Database Quality of Service (QoS) Management bases its decisions on observations of how long work requests spend waiting for resources.

•


Oracle Database QoS Management helps manage the resources shared by databases and their services in a cluster.

Related Topics:

•



9.1 What Does Oracle Database Quality of Service (QoS)

Management Manage?

Oracle Database Quality of Service (QoS) Management works with Oracle Real

Application Clusters (Oracle RAC) and Oracle Clusterware. Oracle Database QoS

Management operates over an entire Oracle RAC cluster, which can support various applications.

Oracle Database QoS Management manages the CPU resource for a cluster. Oracle

Database QoS Management does not manage I/O resources. Therefore, Oracle

Database QoS Management does not effectively manage I/O intensive applications.

Oracle Database QoS Management integrates with the Oracle RAC database through the following technologies to manage resources within a cluster:

• Database Services

9-1

Chapter 9


• Oracle Database Resource Manager

• Oracle Clusterware

• Run-time Connection Load Balancing

Oracle Database QoS Management periodically evaluates the resource wait times for all used resources. If the average response time for the work requests in a

Performance Class is greater than the value specified in its Performance Objective, then Oracle Database QoS Management uses the collected metrics to find the bottlenecked resource. If possible, Oracle Database QoS Management provides recommendations for adjusting the size of the server pools or altering the consumer group mappings in the resource plan used by Oracle Database Resource Manager.

Note:

Oracle Database QoS Management supports only OLTP workloads. The following types of workloads (or database requests) are not supported:

• Batch workloads

• Workloads that require more than one second to complete

• Workloads that use parallel data manipulation language (DML)

• Workloads that query GV$ views at a signification utilization level

9.2 How Does Oracle Database Quality of Service (QoS)

Management Work?

Oracle Database Quality of Service (QoS) Management uses a resource management plan and user-specific performance objectives to allocate resources to defined workloads.

With Oracle Database, use services to manage the workload on your system by starting services on groups of servers that are dedicated to particular workloads. At the database tier, for example, you could dedicate one group of servers to online transaction processing (OLTP), dedicate another group of servers to application testing, and dedicate a third group of servers for internal applications. The system administrator can allocate resources to specific workloads by manually changing the number of servers on which a database service is allowed to run.

Using groups of servers in this way isolates the workloads from each other to prevent demand surges, failures, and other problems in one workload from affecting the other workloads. However, in this type of deployment, you must separately provision the servers to each group to satisfy the peak demand of each workload because resources are not shared.

Oracle Database QoS Management performs the following actions:

1.

Uses a policy created by the Oracle Database QoS Management administrator to do the following:

• Assign each work request to a Performance Class by using the attributes of the incoming work requests, such as the database service to which the application connects.

9-2

Chapter 9

Overview of Metrics

2.

3.

4.

5.

6.

7.

• Determine the target response times (Performance Objectives) for each

Performance Class.

• Determine which Performance Classes are the most critical to your business.

Monitors the resource usage and resource wait times for all the Performance

Classes.

Analyzes the average response time for a Performance Class against the

Performance Objective in effect for that Performance Class.

Produces recommendations for reallocating resources to improve the performance of a Performance Class that is exceeding its target response time.

Provides an analysis of the predicted impact to performance levels for each

Performance Class if that recommendation is implemented.

Implements the actions listed in the recommendation when directed to by the

Oracle Database QoS Management administrator.

Evaluates the system to verify that each Performance Class is meeting its

Performance Objective after the resources have been reallocated.

9.3 Overview of Metrics

Oracle Database Quality of Service (QoS) Management bases its decisions on observations of how long work requests spend waiting for resources.

Examples of resources that work requests can wait for include hardware resources, such as CPU cycles, disk I/O queues, and Global Cache blocks. Other waits can occur within the database, such as latches, locks, pins, and so on. Although the resource waits within the database are accounted for in the Oracle Database QoS Management metrics, they are not managed or specified by type.

The response time of a work request consists of execution time and various wait times; changing or improving the execution time generally requires application source code changes. Oracle Database QoS Management therefore observes and manages only wait times.

Oracle Database QoS Management uses a standardized set of metrics, which are collected by all the servers in the system. There are two types of metrics used to measure the response time of work requests: performance metrics and resource metrics. These metrics enable direct observation of the wait time incurred by work requests in each Performance Class, for each resource requested. Since the work request traverses the servers, networks, and storage devices that form the system.

Another type of metric, the Performance Satisfaction Metric, measures how well the

Performance Objectives for a Performance Class are being met.

Related Topics:

•

Oracle Database Quality of Service Management User's Guide

9.4 Benefits of Using Oracle Database Quality of Service

(QoS) Management

Oracle Database QoS Management helps manage the resources shared by databases and their services in a cluster.

9-3

Chapter 9


In a typical company, when the response times of your applications are not within acceptable levels, problem resolution can be slow. Often, the first questions that administrators ask are: "Did we configure the system correctly? Is there a parameter change that fixes the problem? Do we need more hardware?" Unfortunately, these questions are difficult to answer precisely. The result is often hours of unproductive and frustrating experimentation.

Oracle Database QoS Management provides the following benefits:

• Reduces the time and expertise requirements for system administrators who manage Oracle Real Application Clusters (Oracle RAC) resources

• Helps reduce the number of performance outages

• Reduces the time required to resolve problems that limit or decrease the performance of your applications

• Provides stability to the system as the workloads change

• Makes the addition or removal of servers transparent to applications

• Reduces the impact on the system caused by server failures

• Helps ensure that service-level agreements (SLAs) are met

• Enables more effective sharing of hardware resources

Oracle Database QoS Management can help identify and resolve performance bottlenecks. Oracle Database QoS Management does not diagnose or tune application or database performance issues. When tuning the performance of your applications, the goal is to achieve optimal performance. Oracle Database QoS

Management does not seek to make your applications run faster. Instead, Oracle

Database QoS Management works to remove obstacles that prevent your applications from running at their optimal performance levels.

9-4

A

Oracle ORAchk and Oracle EXAchk

Command-Line Options

Most command-line options apply to both Oracle ORAchk and Oracle EXAchk. Use the command options to control the behavior of Oracle ORAchk and Oracle EXAchk.

Syntax

$ ./orachk options

[-h] [-a] [-b] [-v] [-p] [-m] [-u] [-f] [-o]

[-clusternodes clusternames]

[-output path]

[-dbnames dbnames]

[-localonly]

[-debug]

[-dbnone | -dball]

[-c]

[-upgrade | -noupgrade]

[-syslog]

[-skip_usr_def_checks]

[-checkfaileduploads]

[-uploadfailed all | comma-delimited list of collections]

[-fileattr [start | check | remove ] [-includedir path ] [-excludediscovery] [baseline path [-fileattronly]

[-testemail all | "NOTIFICATION_EMAIL=comma-delimited list of email addresses"]

[-setdbupload all | db upload variable, for example, RAT_UPLOAD_CONNECT_STRING,

RAT_UPLOAD_PASSWORD]

[-unsetdbupload all | db upload variable, for example, RAT_UPLOAD_CONNECT_STRING,


[-checkdbupload]

[-getdbupload]

[-cmupgrade]

[-sendemail "NOTIFICATION_EMAIL=comma-delimited list of email addresses"]

[-nopass]

[-noscore]

[-showpass]

[-show_critical]

[-diff Old Report New Report [-outfile Output HTML] [-force]]

[-merge report 1 report 2 [-force]]

[-tag tagname]

[-daemon [-id ID] -set parameter | [-id ID] -unset parameter | all | [-id ID] -get

parameter | all]

AUTORUN_SCHEDULE=value | AUTORUN_FLAGS=flags | NOTIFICATION_EMAIL=email |

PASSWORD_CHECK_INTERVAL=number of hours | collection_retention=number of days

[-nodaemon]

[-profile asm | clusterware | corroborate | dba | ebs | emagent | emoms | em | goldengate | hardware | maa | oam | oim | oud | ovn | peoplesoft | preinstall | prepatch | security | siebel | solaris_cluster | storage | switch | sysadmin | timesten | user_defined_checks | zfs ]

[-excludeprofile asm | clusterware | corroborate | dba | ebs | emagent | emoms | em | goldengate | hardware | maa | oam | oim | oud | ovn | peoplesoft | preinstall

| prepatch | security | siebel | solaris_cluster | storage | switch | sysadmin |

A-1

Appendix A timesten | user_defined_checks | zfs ]

[-acchk -javahome path to jdk8

-asmhome path to asm-all-5.0.3.jar -appjar directory where jar files are present for

concrete class -apptrc directory where trace files are present for coverage class]

[-check check ids | -excludecheck check ids]

[-zfsnodes nodes]

[-zfssa appliance names]

[-dbserial | -dbparallel [n] | -dbparallelmax]

[-idmpreinstall | -idmpostinstall | -idmruntime] [-topology topology.xml |

-credconfig credconfig] | -idmdbpreinstall | -idmdbpostinstall | -idmdbruntime]

[-idm_config IDMCONFIG] [-idmdiscargs IDMDISCARGS]

[-idmhcargs IDMHCARGS | -h]

$ ./exachk options

[-h] [-a] [-b] [-v] [-p] [-m] [-u] [-f] [-o]

[-clusternodes clusternames]

[-output path]

[-dbnames dbnames]

[-localonly]

[-debug]

[-dbnone | -dball]

[-c]

[-upgrade | -noupgrade]

[-syslog] [-skip_usr_def_checks]


[-uploadfailed all | comma-delimited list of collections]

[-fileattr start | check | remove [-includedir path [-excludediscovery] [-baseline

path[-fileattronly]


[-setdbupload all | db upload variable, for example, RAT_UPLOAD_CONNECT_STRING,


[-unsetdbupload all | db upload variable, for example, RAT_UPLOAD_CONNECT_STRING,


[-checkdbupload]

[-getdbupload]

[-cmupgrade] [-sendemail "NOTIFICATION_EMAIL=comma-delimited list of email

addresses"]

[-nopass]

[-noscore]

[-showpass]

[-show_critical]

[-diff Old Report New Report [-outfile Output HTML] [-force]]

[-merge report 1 report 2 [-force]]

[-tag tagname]

[-auto_restart -initsetup | -initdebugsetup | -initrmsetup | -initcheck | initpresetup | -h]

[-d start|start_debug|stop|status|info|stop_client|nextautorun|-h]

[-daemon [-id ID] -set parameter | [-id ID] -unset parameter | all | [-id ID] -get

parameter | all]

AUTORUN_SCHEDULE=value > | AUTORUN_FLAGS=flags | NOTIFICATION_EMAIL=email |

PASSWORD_CHECK_INTERVAL=number of hours | collection_retention=number of days

[-nodaemon]

[-unlockcells all | -cells comma-delimited list of names or IPs of cells] [lockcells all | -cells comma-delimited list of names or IPs of cells]

[-usecompute]

[-exadiff Exalogic collection1 Exalogic collection2]

[-vmguest ]

[-hybrid [-phy nodes]]

[-profile asm | bi_middleware | clusterware | compute_node | control_VM | corroborate | dba | ebs | el_extensive | el_lite | el_rackcompare | emagent |

A-2

Appendix A

Running Generic Oracle ORAchk and Oracle EXAchk Commands emoms | em | goldengate | hardware | maa | nimbula | obiee | ovn | peoplesoft | platinum | preinstall | prepatch | security | siebel | solaris_cluster | storage | switch | sysadmin | timesten | user_defined_checks | virtual_infra]

[-excludeprofile asm | bi_middleware | clusterware | compute_node | control_VM | corroborate | dba | ebs | el_extensive | el_lite | el_rackcompare | emagent | emoms | em | goldengate | hardware | maa | nimbula | obiee | ovn | peoplesoft | platinum | preinstall | prepatch | security | siebel | solaris_cluster | storage | switch | sysadmin | timesten | user_defined_checks | virtual_infra]

[-check check ids | -excludecheck check ids]

[-cells cells]

[-ibswitches switches]

[-torswitches]

[-extzfsnodes nodes]

[-dbserial | -dbparallel [n] | -dbparallelmax | -allserial]

[-allserial | -dbnodeserial |-cellserial | -switchserial]

•

Running Generic Oracle ORAchk and Oracle EXAchk Commands

List of command options common to Oracle ORAchk and Oracle EXAchk.

•


Use the list of commands in this section to control the scope of checks.

•


Use the list of commands in this section to manage the report output.

•


Use the list of commands in this section to upload results to the database.

•


Use the daemon to configure automatic health check runs at scheduled intervals.

•


Use the list of commands in this section to control the behavior of the daemon.

•



-fileattr


A.1 Running Generic Oracle ORAchk and Oracle EXAchk

Commands

List of command options common to Oracle ORAchk and Oracle EXAchk.

Syntax

[-a]

[-v]

[-debug]

[-daemon]

[-nodaemon]

[-f]

[-upgrade]

[-noupgrade]


[-sendemail “NOTIFICATION_EMAIL=comma-delimited list of email addresses"]

[-dbserial]

[-dbparallel [n]]

[-dbparallelmax]

A-3

Appendix A


Parameters

Table A-1 Generic Commands

Option

-a

-v

-debug

-daemon

-nodaemon

-f

-upgrade

-noupgrade

Description

Runs all checks, including the best practice checks and the recommended patch check. If you do not specify any options, then the tools run all checks by default.

Shows the version of Oracle ORAchk and Oracle EXAchk tools.

Runs in debug mode.

The generated

.zip

file contains a debug log and other files useful for Oracle Support.

Runs only if the daemon is running.

Does not send commands to the daemon, usage is interactive.

Runs Offline. The tools perform health checks on the data already collected from the system.

Forces an upgrade of the version of the tools being run.

Does not prompt for an upgrade even if a later version is available under the location specified in the

RAT_UPGRADE_LOC


Sends a test email to validate email configuration.

-testemail all |

"NOTIFICATION_EMAIL=co

mma-delimited list of

email addresses"

-sendemail

“NOTIFICATION_EMAIL=co

mma-delimited list of

email addresses"

Specify a comma-delimited list of email addresses.

Emails the generated HTML report on completion to the specified email addresses.

-dbserial

-dbparallel [n]

-dbparallelmax

Runs the

SQL

,

SQL_COLLECT

, and

OS

health checks in serial.

Runs the

SQL

,

SQL_COLLECT

, and

OS

health checks in parallel, using

n number of child processes.

Default is 25% of CPUs.

Runs the

SQL

,

SQL_COLLECT

, and

OS

health checks in parallel, using the maximum number of child processes.

A.2 Controlling the Scope of Checks

Use the list of commands in this section to control the scope of checks.

Syntax

[-b]

[-p]

[-m]

[-u –o pre]

[-u –o post]

[-clusternodes nodes]

[-dbnames db_names]

[-dbnone]

[-dball]

A-4

Appendix A


[-localonly]

[-cells cells]

[-ibswitches switches]

[-profile profile]

[-excludeprofile profile]

[-check check_id]

[-excludecheck check_id]

[-skip_usr_def_checks]

Parameters

Table A-2 Scope of Checks

Command

-b

-p

-m

-u –o pre

-u –o post

-clusternodes nodes

-dbnames db_names

-dbnone

-dball

-localonly

-cells cells

-ibswitches switches

Description

Runs only the best practice checks.

Does not run the recommended patch checks.

Runs only the patch checks.

Excludes the checks for Maximum Availability Architecture (MAA) scorecards.

Runs the pre-upgrade checks for Oracle Clusterware and database.

Runs the post-upgrade checks for Oracle Clusterware and database.

Specify a comma-delimited list of node names to run only on a subset of nodes.

Specify a comma-delimited list of database names to run only on a subset of databases.

Does not prompt for database selection and skips all the database checks.

Does not prompt for database selection and runs the database checks on all databases discovered on the system.

Runs only on the local node.

Specify a comma-delimited list of storage server names to run the checks only on a subset of storage servers.

Specify a comma-delimited list of InfiniBand switch names to run the checks only on a subset of InfiniBand switches.

-profile profile

-excludeprofile

profile

Specify a comma-delimited list of profiles to run only the checks in the specified profiles.

Specify a comma-delimited list of profiles to exclude the checks in the specified profiles.

-check check_id

Specify a comma-delimited list of check IDs to run only the checks specified in the list check IDs.

-excludecheck check_id

Specify a comma-delimited list of check IDs to exclude the checks specified in the list of check IDs.

-skip_usr_def_checks

Does not run the checks specified in the user-defined xml

file.

A.3 Managing the Report Output

Use the list of commands in this section to manage the report output.

A-5

Appendix A


Syntax

[-syslog] [-tag tagname]

[-o]

[-nopass]

[-noscore]

[-diff old_report new_report [-outfile output_HTML]]

[-merge [-force] collections]

Parameters

Table A-3 Managing Output

Option

-syslog

-tag tagname

-o

-nopass

-noscore

-diff old_report

new_report [-outfile

output_HTML]

-merge [-force]

collections

Description

Writes JSON results to syslog.

Appends the

tagname

specified to the output report name.

The

tagname

must contain only alphanumeric characters.

Argument to an option.

If

-o

is followed by v

, (or verbose

, and neither option is casesensitive), then the command prints passed checks on the screen.

If the

-o

option is not specified, then the command prints only the failed checks on the screen.

Does not show passed checks in the generated output.

Does not print health score in the HTML report.

Reports the difference between the two HTML reports.

Specify a directory name or a ZIP file or an HTML report file as

old_report

and

new_report

.

Merges a comma-delimited list of collections and prepares a single report.

A.4 Uploading Results to Database

Use the list of commands in this section to upload results to the database.

Syntax

[-setdbupload all|list of variable names]

[-unsetdbupload all|list of variable names]

[-checkdbupload]

[-getdbupload]


[-uploadfailed all|list of failed collections]

A-6

Appendix A


Parameters

Table A-4 Uploading Results to Database

Option

-setdbupload all|

variable_names

-unsetdbupload all|

variable_names

-checkdbupload

-getdbupload

-checkfaileduploads

-uploadfailed all|list

of failed collections

Description

Sets the values in the wallet to upload health check run results to the database.

all

: Sets all the variables in the wallet.

variable_names: Specify a comma-delimited list of variables to set.

Unsets the values in the wallet to upload health check run results to the database.

all

: Unsets all the variables in the wallet.

variable_names: Specify a comma-delimited list of variables to unset.

Checks if the variables are set correctly for uploading the health check run results to the database.

Prints the variables with their values from wallet for uploading the health check run result to the database.

Reports any failed collection uploads.

Reattempts to upload one or more failed collection uploads.

all

: Reattempts to upload all the filed collection uploads.

list of failed collections: Specify a comma-delimited list of collections to upload.

A.5 Configuring the Daemon Mode

Use the daemon to configure automatic health check runs at scheduled intervals.

Note:

If you have an Oracle Engineered System, then in addition to the following usage steps, follow the system-specific instructions.

1.

Set the daemon properties.

At a minimum, set

AUTORUN_SCHEDULE

and

NOTIFICATION_EMAIL

.

For example, to set the tool to run at 3 AM every Sunday and email the results to [email protected]

, run the following command:

$ ./orachk –set “AUTORUN_SCHEDULE=3 * *

0 ;[email protected]”

2.

3.

$ ./exachk –set “AUTORUN_SCHEDULE=3 * *

0 ;[email protected]”

Configure the health check daemon.

Start the daemon as root

(recommended) or as the Oracle Database or Oracle

Grid Infrastructure home owner.

A-7

Appendix A


# ./orachk –d start

4.

# ./exachk –d start

Answer the questions prompted during startup.

A.6 Controlling the Behavior of the Daemon

Use the list of commands in this section to control the behavior of the daemon.

Syntax

[-id id] –set daemon_option

[-id id] -unset daemon_option | all

[-id id] -get parameter | all

[-d start]

[-d start_debug]

[-d stop]

[-d stop_client]

[-d status]

[-d info]

[-id id] -d nextautorun

[-initsetup]

[-initrmsetup]

[-initcheck]

[-initpresetup]

Parameters

Table A-5 Daemon Options

Option

[-id id] –set

daemon_option

[-id id] -unset

daemon_option | all

[-id id] -get

parameter | all

-d start

-d start_debug

-d stop

-d stop_client

-d status

-d info

[-id id] -d nextautorun

-initsetup

-initrmsetup

-initcheck

Description

Optionally use id

with the set

command to set specific daemon usage profiles.

Unsets the parameter.

Use with

–id id

to set a daemon profile-specific value.

Displays the value of the specified parameter or all the parameters.

Use with

–id id

to set a daemon profile-specific value.

Starts the daemon.

Starts the daemon in debug mode.

Stops the daemon.

Forces a running daemon client to stop.

Checks the current status of the daemon.

Displays details about the daemon.

The details include installation and when the daemon was started.

Displays details about when the next scheduled automatic run occurs.

Sets the daemon auto restart function that starts the daemon when the node starts.

Removes the automatic restart functionality.

Checks if the automatic restart functionality is set up.

A-8

Appendix A


Table A-5 (Cont.) Daemon Options

Option

-initpresetup

Description

Sets the root

user equivalency for

COMPUTE

,

STORAGE

, and

IBSWITCHES

( root

equivalency for

COMPUTE

nodes is mandatory for setting up auto restart functionality).

A.7 Tracking File Attribute Changes


-fileattr


Syntax

[-fileattr start]

[-fileattr check]

[-fileattr remove]

[-fileattr [start|check] -includedir directories]

[-fileattr [start|check] -excludediscovery]

[-fileattr check -baseline baseline snapshot path]

[-fileattr check –fileattronly]

Table A-6 List of Oracle ORAchk and Oracle EXAchk File Attribute Tracking

Options

Option

-fileattr start

-fileattr check

-fileattr remove

-fileattr [start| check] -includedir

directories

-fileattr [start| check] excludediscovery

Description

Takes file attribute snapshots of discovered directories, and stores the snapshots in the output directory.

By default, this option takes snapshots of Oracle Grid Infrastructure homes, and all the installed Oracle Database homes. If a user does not own a particular directory, then the tool does not take snapshots of the directory.

Takes a new snapshot of discovered directories, and compares it with the previous snapshot.

Removes file attribute snapshots and related files.

Specify a comma-delimited list of directories to check file attributes.

For example:

./orachk -fileattr start -includedir "/root/home,/etc"

./orachk -fileattr check -includedir "/root/home,/etc"

Excludes the discovered directories.

For example:

./orachk -fileattr start -includedir "/root/home,/etc" excludediscovery

A-9

Appendix A


Table A-6 (Cont.) List of Oracle ORAchk and Oracle EXAchk File Attribute

Tracking Options

Option

-fileattr check baseline baseline

snapshot path

-fileattr check – fileattronly

Description

Uses a snapshot that you designate as the baseline for a snapshot comparison. Provide the path to the snapshot that you want to use as the baseline.

A baseline is the starting file attributes that you want to compare to at later times. Current file attributes are compared to the baseline and a delta is reported.

For example:

./orachk -fileattr check -baseline "/tmp/Snapshot"

Performs only file attributes check, and then exits Oracle ORAchk.

For example:

./orachk -fileattr check -fileattronly

A-10

B



Use OCLUMON to perform miscellaneous administrative tasks, such as changing the debug levels, querying the version of Cluster Health Monitor, and changing the metrics database size.

•

oclumon debug

Use the oclumon debug

command to set the log level for the Cluster Health Monitor services.

•

oclumon dumpnodeview

Use the oclumon dumpnodeview

command to view log information from the system monitor service in the form of a node view.

•

oclumon manage

Use the oclumon manage

command to view and change configuration information from the system monitor service.

•

oclumon version

Use the oclumon version

command to obtain the version of Cluster Health Monitor that you are using.

B.1 oclumon debug

Use the oclumon debug

command to set the log level for the Cluster Health Monitor services.

Syntax

oclumon debug [log daemon module:log_level] [version]

B-1

Appendix B oclumon dumpnodeview

Parameters

Table B-1 oclumon debug Command Parameters

Parameter Description

log daemon module:log_level

Use this option change the log level of daemons and daemon modules.

Supported daemons are: osysmond ologgerd client all

Supported daemon modules are: osysmond

:

CRFMOND

,

CRFM

, and allcomp ologgerd

:

CRFLOGD

,

CRFLDREP

,

CRFM

, and allcomp client

:

OCLUMON

,

CRFM

, and allcomp all

: allcomp

Supported

log_level

values are

0

,

1

,

2

, and

3

.

Use this option to display the versions of the daemons.

version

Example B-1 oclumon debug

The following example sets the log level of the system monitor service ( osysmond

):

$ oclumon debug log osysmond CRFMOND:3

The following example displays the versions of the daemons:

$ oclumon debug version

OCLUMON version :0.02

OSYSMOND version :12.01

OLOGGERD version :2.01

NODEVIEW version :12.01

Clusterware version - label date:

12.2.0.1.0 - 160825

B.2 oclumon dumpnodeview

Use the oclumon dumpnodeview

command to view log information from the system monitor service in the form of a node view.

Usage Notes

A node view is a collection of all metrics collected by Cluster Health Monitor for a node at a point in time. Cluster Health Monitor attempts to collect metrics every five seconds on every node. Some metrics are static while other metrics are dynamic.

A node view consists of eight views when you display verbose output:

• SYSTEM: Lists system metrics such as CPU COUNT, CPU USAGE, and MEM

USAGE

B-2


• TOP CONSUMERS: Lists the top consuming processes in the following format:

metric_name: 'process_name(process_identifier) utilization'

• CPUS: Lists statistics for each CPU

• PROCESSES: Lists process metrics such as PID, name, number of threads, memory usage, and number of file descriptors

• DEVICES: Lists device metrics such as disk read and write rates, queue length, and wait time per I/O

• NICS: Lists network interface card metrics such as network receive and send rates, effective bandwidth, and error rates

• FILESYSTEMS: Lists file system metrics, such as total, used, and available space

• PROTOCOL ERRORS: Lists any protocol errors

Generate a summary report that only contains the SYSTEM and TOP CONSUMERS views.

Syntax

oclumon dumpnodeview [-allnodes | -n node1 ...] [-last duration | -s timestamp -e

timestamp] [-i interval] [-v | [-system][-process][-procag][-device][-filesystem][nic][-protoerr][-cpu][-topconsumer]] [-format format type] [-dir directory [-append]]

Parameters

Table B-2 oclumon dumpnodeview Command Parameters

Parameter

-allnodes

-n node1 node2

-last "duration"

-s "time_stamp" -e

"time_stamp"

-i interval

-v

Description

Use this option to dump the node views of all the nodes in the cluster.

Specify one node or several nodes in a space-delimited list for which you want to dump the node view.

Use this option to specify a time, given in

HH24:MM:SS

format surrounded by double quotation marks (

""

), to retrieve the last metrics.

For example:

"23:05:00"

Use the

-s

option to specify a time stamp from which to start a range of queries and use the

-e

option to specify a time stamp to end the range of queries.

Specify time in

YYYY-MM-DD HH24:MM:SS

format surrounded by double quotation marks (

""

).

For example:

"2011-05-10 23:05:00"

Note: Specify these two options together to obtain a range.

Specify a collection interval, in five-second increments.

Displays verbose node view output.

B-3


Table B-2 (Cont.) oclumon dumpnodeview Command Parameters

Parameter

-system, -process, device, -filesystem, nic, -protoerr, -cpu, topconsumer

Description

Dumps each specified node view parts.

-format "format type"

-dir directory

-procag

Specify the output format.

"format type" can be legacy

, tabular

, or csv

.

The default format is mostly tabular with legacy for node view parts with only one row.

Dumps the node view to the files in the directory that you specify.

Specify the

-append

option to append the files of the current to the existing files. If you do not specify

–append

, then the command overwrites the existing files, if present.

For example, the command oclumon dumpnodeview -dir

dir_name

dumps the data in the specified directory.

If this command is run twice, it overwrites the data dumped by the previous run.

Running the command with

-append

, for example, oclumon dumpnodeview -dir dir_name -append

, appends the data of the current run with the previous one in the specified directory.

Outputs the process of the node view, aggregated by category:

• DBBG (DB backgrounds)

• DBFG (DB foregrounds)

• CLUST (Cluster)

• OTHER (other processes)

Displays online help for the oclumon dumpnodeview

command.

-h

Usage Notes

• In certain circumstances, data can be delayed for some time before the command replays the data.

For example, the crsctl stop cluster -all

command can cause data delay. After running crsctl start cluster -all

, it may take several minutes before oclumon dumpnodeview

shows any data collected during the interval.

• The default is to continuously dump node views. To stop continuous display, use

Ctrl+C on Linux and Microsoft Windows.

• Both the local system monitor service ( osysmond

) and the cluster logger service

( ologgerd

) must be running to obtain node view dumps.

• The oclumon dumpnodeview

command displays only 127 CPUs of the CPU core, omitting a CPU at random from the list.

Metric Descriptions

This section includes descriptions of the metrics in each of the seven views that comprise a node view listed in the following tables.

B-4


Table B-3 oclumon dumpnodeview SYSTEM View Metric Descriptions

Metric

#pcpus

#vcpus cpuht chipname cpu cpuq physmemfree physmemtotal mcache swapfree swaptotal hugepagetotal hugepagefree hugepagesize ior iow ios swpin

Description

The number of physical CPUs.

Number of logical compute units.

CPU hyperthreading enabled (Y) or disabled (N).

The name of the CPU vendor.

Average CPU utilization per processing unit within the current sample interval (%).

Number of processes waiting in the run queue within the current sample interval.

Amount of free RAM (KB).

Amount of total usable RAM (KB).

Amount of physical RAM used for file buffers plus the amount of physical RAM used as cache memory (KB).

On Windows systems, this is the number of bytes currently being used by the file system cache.

Note: This metric is not available on Solaris.

Amount of swap memory free (KB)

Total amount of physical swap memory (KB)

Total size of huge in KB

Note: This metric is not available on Solaris or Microsoft Windows systems.

Free size of huge page in KB


Smallest unit size of huge page


Average total disk read rate within the current sample interval (KB per second).

Average total disk write rate within the current sample interval (KB per second).

Average disk I/O operation rate within the current sample interval

(I/O operations per second).

Average swap in rate within the current sample interval (KB per second).

Note: This metric is not available on Microsoft Windows systems.

B-5


Table B-3 (Cont.) oclumon dumpnodeview SYSTEM View Metric Descriptions

Metric

swpout pgin pgout netr netw procs procsoncpu rtprocs rtprocsoncpu

#fds

#sysfdlimit

#disks

#nics nicErrors

Description

Average swap out rate within the current sample interval (KB per second).


Average page in rate within the current sample interval (pages per second).

Average page out rate within the current sample interval (pages per second).

Average total network receive rate within the current sample interval (KB per second).

Average total network send rate within the current sample interval

(KB per second).

Number of processes.

The current number of processes running on the CPU.

Number of real-time processes.

The current number of real-time processes running on the CPU.

Number of open file descriptors.

or

Number of open handles on Microsoft Windows.

System limit on number of file descriptors.

Note: This metric is not available on either Solaris or Microsoft

Windows systems.

Number of disks.

Number of network interface cards.

Average total network error rate within the current sample interval

(errors per second).

Table B-4 oclumon dumpnodeview PROCESSES View Metric Descriptions

Metric

name pid

#procfdlimit

Description

The name of the process executable.

The process identifier assigned by the operating system.

Limit on number of file descriptors for this process.

Note: This metric is not available on Microsoft Windows, AIX, and

HP-UX systems.

B-6


Table B-4 (Cont.) oclumon dumpnodeview PROCESSES View Metric

Descriptions

Metric

cpuusage privmem shm workingset

#fd

#threads priority nice state

Description

Process CPU utilization (%).

Note: The utilization value can be up to 100 times the number of processing units.

Process private memory usage (KB).

Process shared memory usage (KB).

Note: This metric is not available on Microsoft Windows, Solaris, and AIX systems.

Working set of a program (KB)

Note: This metric is only available on Microsoft Windows.

Number of file descriptors open by this process.

or

Number of open handles by this process on Microsoft Windows.

Number of threads created by this process.

The process priority.

The nice value of the process.

Note: This metric is not applicable to Microsoft Windows systems.

The state of the process.

Note: This metric is not applicable to Microsoft Windows systems.

Table B-5 oclumon dumpnodeview DEVICES View Metric Descriptions

Metric

ior iow ios qlen wait type

Description

Average disk read rate within the current sample interval (KB per second).

Average disk write rate within the current sample interval (KB per second).

Average disk I/O operation rate within the current sample interval

(I/O operations per second)

Number of I/O requests in

WAIT

state within the current sample interval.

Average wait time per I/O within the current sample interval (msec).

If applicable, identifies what the device is used for. Possible values are:

•

SWAP

•

SYS

•

OCR

•

ASM

•

VOTING

B-7


Table B-6 oclumon dumpnodeview NICS View Metric Descriptions

Metric

netrr netwr neteff nicerrors pktsin pktsout errsin errsout indiscarded outdiscarded inunicast type innonunicast

Description

Average network receive rate within the current sample interval (KB per second).

Average network sent rate within the current sample interval (KB per second).

Average effective bandwidth within the current sample interval (KB per second)

Average error rate within the current sample interval (errors per second).

Average incoming packet rate within the current sample interval

(packets per second).

Average outgoing packet rate within the current sample interval

(packets per second).

Average error rate for incoming packets within the current sample interval (errors per second).

Average error rate for outgoing packets within the current sample interval (errors per second).

Average drop rate for incoming packets within the current sample interval (packets per second).

Average drop rate for outgoing packets within the current sample interval (packets per second).

Average packet receive rate for unicast within the current sample interval (packets per second).

Whether PUBLIC or PRIVATE.

Average packet receive rate for multi-cast (packets per second).

Estimated latency for this network interface card (msec).

latency

Table B-7 oclumon dumpnodeview FILESYSTEMS View Metric Descriptions

Metric

total mount type

Description

Total amount of space (KB).

Mount point.

File system type, whether local file system, NFS, or other.

Amount of used space (KB).

used

B-8


Table B-7 (Cont.) oclumon dumpnodeview FILESYSTEMS View Metric

Descriptions

Metric

available used% ifree%

Description

Amount of available space (KB).

Percentage of used space (%)

Percentage of free file nodes (%).


Table B-8 oclumon dumpnodeview PROTOCOL ERRORS View Metric

Descriptions

Metric

IPHdrErr

IPAddrErr

IPUnkProto

IPReasFail

IPFragFail

TCPFailedConn

TCPEstRst

TCPRetraSeg

UDPUnkPort

UDPRcvErr

Description

Number of input datagrams discarded due to errors in the IPv4 headers of the datagrams.

Number of input datagrams discarded because the IPv4 address in their IPv4 header's destination field was not a valid address to be received at this entity.

Number of locally addressed datagrams received successfully but discarded because of an unknown or unsupported protocol.

Number of failures detected by the IPv4 reassembly algorithm.

Number of IPv4 discarded datagrams due to fragmentation failures.

Number of times that TCP connections have made a direct transition to the

CLOSED

state from either the

SYN-SENT

state or the

SYN-RCVD

state, plus the number of times that TCP connections have made a direct transition to the

LISTEN

state from the

SYN-RCVD state.

Number of times that TCP connections have made a direct transition to the

CLOSED

state from either the

ESTABLISHED

state or the

CLOSE-WAIT

state.

Total number of TCP segments retransmitted.

Total number of received UDP datagrams for which there was no application at the destination port.

Number of received UDP datagrams that could not be delivered for reasons other than the lack of an application at the destination port.

Table B-9 oclumon dumpnodeview CPUS View Metric Descriptions

Metric

cpuid

Description

Virtual CPU.

B-9


Table B-9 (Cont.) oclumon dumpnodeview CPUS View Metric Descriptions

Metric

sys-usage user-usage nice usage

Description

CPU usage in system space.

CPU usage in user space.

Value of NIC for a specific CPU.

CPU usage for a specific CPU.

CPU wait time for I/O operations.

iowait

Example B-2 dumpnodeview -n

The following example dumps node views from node1

, node2

, and node3

collected over the last 12 hours:

$ oclumon dumpnodeview -n node1 node2 node3 -last "12:00:00"

The following example displays node views from all nodes collected over the last 15 minutes at a 30-second interval:

$ oclumon dumpnodeview -allnodes -last "00:15:00" -i 30

Example B-3 dumpnodeview –format csv

The following example shows how to use the option

-format csv

to output content in comma-separated values file format:

# oclumon dumpnodeview –format csv dumpnodeview: Node name not given. Querying for the local host

----------------------------------------

Node: node1 Clock: '2016-09-02 11.18.00-0700' SerialNo:310668

----------------------------------------

SYSTEM:

"#pcpus","#cores","#vcpus","cpuht","chipname","cpuusage[%]","cpusys[%]","cpuuser[%]",

"cpunice[%]","cpuiowait[%]","cpusteal[%]","cpuq","physmemfree[KB]","physmemtotal[KB]"

,

"mcache[KB]","swapfree[KB]","swaptotal[KB]","hugepagetotal","hugepagefree","hugepages ize",

"ior[KB/S]","iow[KB/S]","ios[#/S]","swpin[KB/S]","swpout[KB/S]","pgin[#/S]","pgout[#/

S]",

"netr[KB/S]","netw[KB/

S]","#procs","#procsoncpu","#procs_blocked","#rtprocs","#rtprocsoncpu",

"#fds","#sysfdlimit","#disks","#nics","loadavg1","loadavg5","loadavg15","#nicErrors"

2,12,24,Y,"Intel(R) Xeon(R) CPU X5670 @ 2.93GHz",

68.66,5.40,63.26,0.00,0.00,0.00,0,820240,

73959636,61520568,4191424,4194300,0,0,

2048,143,525,64,0,0,0,279,600.888,437.070,951,24,0,58,N/A,

33120,6815744,13,5,19.25,17.67,16.09,0

B-10


TOPCONSUMERS:

"topcpu","topprivmem","topshm","topfd","topthread"

"java(25047) 225.44","java(24667) 1008360","ora_lms1_prod_1(28913) 4985464","polkitgnome-au(20730) 1038","java(2734) 209"

Example B-4 dumpnodeview –procag

The following example shows how to output node views, aggregated by category:

DBBG (DB backgrounds), DBFG (DB foregrounds), CLUST (Cluster), and OTHER

(other processes).

# oclumon dumpnodeview –procag

----------------------------------------

Node: node1 Clock: '2016-09-02 11.14.15-0700' SerialNo:310623

----------------------------------------

PROCESS AGGREGATE: cpuusage[%] privatemem[KB] maxshmem[KB] #threads #fd #processes category sid

0.62 45791348 4985200 187 10250 183

DBBG prod_1

0.52 29544192 3322648 191 10463 187

DBBG webdb_1

17.81 8451288 967924 22 511 22

DBFG webdb_1

75.94 34930368 1644492 64 1067 64

DBFG prod_1

3.42 3139208 120256 480 3556 25

CLUST

1.66 1989424 16568 1110 4040 471

OTHER

Example B-5 Node View Output

----------------------------------------

Node: rwsak10 Clock: '2016-05-08 02.11.25-0800' SerialNo:155631

----------------------------------------

SYSTEM:

#pcpus: 2 #vcpus: 24 cpuht: Y chipname: Intel(R) cpu: 1.23 cpuq: 0 physmemfree: 8889492 physmemtotal: 74369536 mcache: 55081824 swapfree: 18480404 swaptotal: 18480408 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 132 iow: 236 ios: 23 swpin: 0 swpout: 0 pgin: 131 pgout: 235 netr: 72.404

netw: 97.511 procs: 969 procsoncpu: 6 rtprocs: 62 rtprocsoncpu N/A #fds: 32640

#sysfdlimit: 6815744 #disks: 9 #nics: 5 nicErrors: 0

TOP CONSUMERS: topcpu: 'osysmond.bin(30981) 2.40' topprivmem: 'oraagent.bin(14599) 682496' topshm: 'ora_dbw2_oss_3(7049) 2156136' topfd: 'ocssd.bin(29986) 274' topthread: 'java(32255) 53'

CPUS:

.

.

cpu18: sys-2.93 user-2.15 nice-0.0 usage-5.8 iowait-0.0 steal-0.0

.

PROCESSES: name: 'osysmond.bin' pid: 30891 #procfdlimit: 65536 cpuusage: 2.40 privmem: 35808

B-11

Appendix B oclumon manage

.

.

shm: 81964 #fd: 119 #threads: 13 priority: -100 nice: 0 state: S

.

DEVICES:

.

.

sdi ior: 0.000 iow: 0.000 ios: 0 qlen: 0 wait: 0 type: SYS sda1 ior: 0.000 iow: 61.495 ios: 629 qlen: 0 wait: 0 type: SYS

.

NICS: lo netrr: 39.935 netwr: 39.935 neteff: 79.869 nicerrors: 0 pktsin: 25 pktsout: 25 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 25 innonunicast: 0 type: PUBLIC eth0 netrr: 1.412 netwr: 0.527 neteff: 1.939 nicerrors: 0 pktsin: 15 pktsout: 4 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 15 innonunicast: 0 type: PUBLIC latency: <1

FILESYSTEMS:

.

.

mount: / type: rootfs total: 563657948 used: 78592012 available: 455971824 used%: 14 ifree%: 99 GRID_HOME

.

PROTOCOL ERRORS:

IPHdrErr: 0 IPAddrErr: 0 IPUnkProto: 0 IPReasFail: 0 IPFragFail: 0

TCPFailedConn: 5197 TCPEstRst: 717163 TCPRetraSeg: 592 UDPUnkPort: 103306

UDPRcvErr: 70

B.3 oclumon manage

Use the oclumon manage

command to view and change configuration information from the system monitor service.

Syntax

oclumon manage -repos {{changeretentiontime time} | {changerepossize memory_size}} |

-get {key1 [key2 ...] | alllogger [-details] | mylogger [-details]}

B-12

Appendix B oclumon manage

Parameters

Table B-10 oclumon manage Command Parameters

Parameter

-repos {{changeretentiontime

time} | {changerepossize

memory_size}}

-get key1 [key2 ...]

-h

Description

The

-repos

flag is required to specify the following

Cluster Health Monitor repository-related options:

• changeretentiontime time

: Use this option to confirm that there is sufficient tablespace to hold the amount of Cluster Health Monitor data that can be accumulated in a specific amount of time.

Note: This option does not change retention time.

• changerepossize memory_size

: Use this option to change the Cluster Health Monitor repository space limit to a specified number of MB

Caution: If you decrease the space limit of the

Cluster Health Monitor repository, then all data collected before the resizing operation is permanently deleted.

Use this option to obtain Cluster Health Monitor repository information using the following keywords: repsize

: Size of the Cluster Health Monitor repository, in seconds reppath

: Directory path to the Cluster Health Monitor repository master

: Name of the master node alllogger

: Special key to obtain a list of all nodes running Cluster Logger Service mylogger

: Special key to obtain the node running the

Cluster Logger Service which is serving the current node

•

-details

: Use this option with alllogger

and mylogger

for listing nodes served by the Cluster

Logger Service

You can specify any number of keywords in a spacedelimited list following the

-get

flag.

Displays online help for the oclumon manage

command

Usage Notes

• The local system monitor service must be running to change the retention time of the Cluster Health Monitor repository.

• The Cluster Logger Service must be running to change the retention time of the

Cluster Health Monitor repository.

Example B-6 oclumon manage

The following examples show commands and sample output:

$ oclumon manage -get MASTER

Master = node1

B-13

Appendix B oclumon version

$ oclumon manage -get alllogger -details

Logger = node1

Nodes = node1,node2

$ oclumon manage -repos changeretentiontime 86400

$ oclumon manage -repos changerepossize 6000

B.4 oclumon version

Use the oclumon version

command to obtain the version of Cluster Health Monitor that you are using.

Syntax

oclumon version

Example B-7 oclumon version

This command produces output similar to the following:

Cluster Health Monitor (OS), Version 12.2.0.1.0 - Production Copyright 2007,

2016 Oracle. All rights reserved.

B-14

C


Running diagnostics collection script provide additional information so My Oracle

Support can resolve problems.

Syntax

diagcollection.pl {--collect [--crs | --acfs | -all] [--chmos [--incidenttime time

[--incidentduration time]]] [--adr location [--aftertime time [--beforetime time]]]

[--crshome path | --clean | --coreanalyze}]

Note:

Prefix diagcollection.pl

script arguments with two dashes (

--

).

C-1

Appendix C

Parameters

Table C-1 diagcollection.pl Script Parameters

Parameter

--collect

--clean

--coreanalyze

Related Topics:

• Oracle Database Utilities

Description

Use this parameter with any of the following arguments:

•

--crs

: Use this argument to collect Oracle Clusterware diagnostic information

•

--acfs

: Use this argument to collect Oracle ACFS diagnostic information

Note: You can only use this argument on UNIX systems.

•

--all

: (default) Use this argument to collect all diagnostic information except Cluster Health Monitor (OS) data.

•

--chmos

: Use this argument to collect the following Cluster

Health Monitor diagnostic information

--incidenttime time

: Use this argument to collect Cluster

Health Monitor (OS) data from the specified time

Note: The time format is

MM/DD/YYYYHH24:MM:SS

.

--incidentduration time

: Use this argument with

-incidenttime

to collect Cluster Health Monitor (OS) data for the duration after the specified time


HH:MM

. If you do not use

-incidentduration

, then all Cluster Health Monitor (OS) data after the time you specify in

--incidenttime

is collected.

•

--adr location

: The Automatic Diagnostic Repository

Command Interpreter (ADRCI) uses this argument to specify a location in which to collect diagnostic information for ADR

•

--aftertime time

: Use this argument with the

--adr argument to collect archives after the specified time


YYYYMMDDHHMISS24

.

•

--beforetime time

: Use this argument with the

--adr argument to collect archives before the specified time


YYYYMMDDHHMISS24

.

•

--crshome path

: Use this argument to override the location of the Oracle Clusterware home

Note: The diagcollection.pl

script typically derives the location of the Oracle Clusterware home from the system configuration (either the olr.loc

file or the Microsoft

Windows registry), so this argument is not required.

Use this parameter to clean up the diagnostic information gathered by the diagcollection.pl

script.

Note: You cannot use this parameter with

--collect

.

Use this parameter to extract information from core files and store it in a text file.

Note: You can only use this parameter on UNIX systems.

C-2

D

Managing the Cluster Resource Activity

Log

Oracle Clusterware stores logs about resource failures in the cluster resource activity log, which is located in the Grid Infrastructure Management Repository.

Failures can occur as a result of a problem with a resource, a hosting node, or the network.

The cluster resource activity log provides precise and specific information about a resource failure, separate from diagnostic logs. The cluster resource activity log also provides a unified view of the cause of resource failure.

Use the following commands to manage and view the contents of the cluster resource activity log:

•

crsctl query calog

Query the cluster resource activity logs matching specific criteria.

•


To store Oracle Clusterware-managed resource activity information, query the maximum space allotted to the cluster resource activity log.

•


Query the retention time of the cluster resource activity log.

•

crsctl set calog maxsize

Configure the maximum amount of space allotted to store Oracle Clusterwaremanaged resource activity information.

•

crsctl set calog retentiontime

Configure the retention time of the cluster resource activity log.

D.1 crsctl query calog

Query the cluster resource activity logs matching specific criteria.

Syntax

crsctl query calog [-aftertime "timestamp"] [-beforetime "timestamp"]

[-duration "time_interval" | -follow] [-filter "filter_expression"]

[-fullfmt | -xmlfmt]

D-1

Appendix D crsctl query calog

Parameters

Table D-1 crsctl query calog Command Parameters

Parameter Description

-aftertime "timestamp"

Displays the activities logged after a specific time.

Specify the timestamp in the

YYYY-MM-DD HH24:MI:SS[.FF]

[TZH:TZM]

or

YYYY-MM-DD

or

HH24:MI:SS[.FF][TZH:TZM]

format.

TZH

and

TZM

stands for time zone hour and minute, and

FF

stands for microseconds.

If you specify

[TZH:TZM]

, then the crsctl

command assumes UTC as time zone. If you do not specify

[TZH:TZM]

, then the crsctl command assumes the local time zone of the cluster node from where the crsctl

command is run.

Use this parameter with

-beforetime

to query the activities logged at a specific time interval.

-beforetime

"timestamp"

-duration

"time_interval" | follow

-filter

"filter_expression"

-fullfmt | -xmlfmt

Displays the activities logged before a specific time.


YYYY-MM-DD HH24:MI:SS[.FF]

[TZH:TZM]

or

YYYY-MM-DD

or

HH24:MI:SS[.FF][TZH:TZM]

format.

TZH

and

TZM

stands for time zone hour and minute, and

FF

stands for microseconds.

If you specify

[TZH:TZM]

, then the crsctl

command assumes UTC as time zone. If you do not specify

[TZH:TZM]

, then the crsctl command assumes the local time zone of the cluster node from where the crsctl

command is run.

Use this parameter with

-aftertime

to query the activities logged at a specific time interval.

Use

-duration

to specify a time interval that you want to query when you use the

-aftertime

parameter.


DD HH:MM:SS

format.

Use

-follow

to display a continuous stream of activities as they occur.

Query any number of fields in the cluster resource activity log using the

-filter

parameter.

To specify multiple filters, use a comma-delimited list of filter expressions surrounded by double quotation marks (

""

).

To display cluster resource activity log data, choose full or XML format.

Cluster Resource Activity Log Fields

Query any number of fields in the cluster resource activity log using the

-filter parameter.

D-2


Table D-2 Cluster Resource Activity Log Fields

Field

timestamp writer_process_id writer_process_name writer_user writer_group writer_hostname writer_clustername nls_product nls_facility nls_id nls_field_count nls_field1 nls_field1_type nls_format

Description

The time when the cluster resource activities were logged.

Use Case

Use this filter to query all the activities logged at a specific time.

This is an alternative to

aftertime

,

-beforetime

, and

duration

command parameters.

Query only the activities spawned by a specific process.

The ID of the process that is writing to the cluster resource activity log.

The name of the process that is writing to the cluster resource activity log.

The name of the user who is writing to the cluster resource activity log.

The name of the group to which a user belongs who is writing to the cluster resource activity log.

The name of the host on which the cluster resource activity log is written.

The name of the cluster on which the cluster resource activity log is written.

The product of the NLS message, for example,

CRS

,

ORA

, or srvm

.

The facility of the NLS message, for example,

CRS

or

PROC

.

The ID of the NLS message, for example 42008.

The number of fields in the NLS message.

The first field of the NLS message.

The type of the first field in the

NLS message.

The format of the NLS message, for example,

Resource '%s' has been modified

.

When you query a specific process, CRSCTL returns all the activities for a specific process.

Query all the activities written by a specific user.

Query all the activities written by users belonging to a specific user group.

Query all the activities written by a specific host.

Query all the activities written by a specific cluster.

Query all the activities that have a specific product name.

Query all the activities that have a specific facility name.

Query all the activities that have a specific message ID.

Query all the activities that correspond to NLS messages with more than, less than, or equal to nls_field_count command parameters.

Query all the activities that match the first parameter of an

NLS message.

Query all the activities that match a specific type of the first parameter of an NLS message.

Query all the activities that match a specific format of an

NLS message.

D-3


Table D-2 (Cont.) Cluster Resource Activity Log Fields

Field

nls_message actid is_planned onbehalfof_user entity_isoraentity entity_type entity_name

Description

The entire NLS message that was written to the cluster resource activity log, for example,

Resource

'ora.cvu' has been modified

.

Use Case

Query all the activities that match a specific NLS message.

The unique activity ID of every cluster activity log.

Query all the activities that match a specific ID.

Also, specify only partial actid and list all activities where the actid

is a subset of the activity

ID.

Confirms if the activity is planned or not.

For example, if a user issues the command crsctl stop crs

on a node, then the stack stops and resources bounce.

Running the crsctl stop crs command generates activities and logged in the calog

. Since this is a planned action, the is_planned

field is set to true

(1).

Otherwise, the is_planned

field is set to false (0).

Query all the planned or unplanned activities.

The name of the user on behalf of whom the cluster activity log is written.

Query all the activities written on behalf of a specific user.

Confirms if the entity for which the calog activities are being logged is an oracle entity or not.

If a resource, such as ora.***

, is started or stopped, for example, then all those activities are logged in the cluster resource activity log.

Since ora.***

is an Oracle entity, the entity_isoraentity field is set to true (1).

Otherwise the entity_isoraentity

field is set to false (0).

Query all the activities logged by

Oracle or non-Oracle entities.

Query all the activities that match a specific entity.

The type of the entity, such as

server, for which the cluster activity log is written.

The name of the entity, for example, foo for which the cluster activity log is written.

Query all the cluster activities that match a specific entity name.

D-4


Table D-2 (Cont.) Cluster Resource Activity Log Fields

Field

entity_hostname entity_clustername

Description

The name of the host, for example, node1

, associated with the entity for which the cluster activity log is written.

The name of the cluster, for example, cluster1 associated with the entity for which the cluster activity log is written.

Use Case

Query all the cluster activities that match a specific host name.

.

Query all the cluster activities that match a specific cluster name.

Usage Notes

Combine simple filters into expressions called expression filters using Boolean operators.

Enclose timestamps and time intervals in double quotation marks ("").

Enclose the filter expressions in double quotation marks ("").

Enclose the values that contain parentheses or spaces in single quotation marks ('').

If no matching records are found, then the Oracle Clusterware Control (CRSCTL) utility displays the following message:

CRS-40002: No activities match the query.

Examples

Examples of filters include:

•

"writer_user==root"

: Limits the display to only root user.

•

"customer_data=='GEN_RESTART@SERVERNAME(rwsbi08)=StartCompleted~'"

: Limits the display to customer_data

that has the specified value

GEN_RESTART@SERVERNAME(node1)=StartCompleted~

.

To query all the resource activities and display the output in full format:

$ crsctl query calog -fullfmt

----ACTIVITY START---timestamp : 2016-09-27 17:55:43.152000

writer_process_id : 6538 writer_process_name : crsd.bin

writer_user : root writer_group : root writer_hostname : node1 writer_clustername : cluster1-mb1 customer_data : CHECK_RESULTS=-408040060~ nls_product : CRS nls_facility : CRS nls_id : 2938 nls_field_count : 1 nls_field1 : ora.cvu

nls_field1_type : 25 nls_field1_len : 0 nls_format : Resource '%s' has been modified.

D-5

Appendix D crsctl query calog nls_message : Resource 'ora.cvu' has been modified.

actid : 14732093665106538/1816699/1 is_planned : 1 onbehalfof_user : grid onbehalfof_hostname : node1 entity_isoraentity : 1 entity_type : resource entity_name : ora.cvu

entity_hostname : node1 entity_clustername : cluster1-mb1

----ACTIVITY END----

To query all the resource activities and display the output in XML format:

$ crsctl query calog -xmlfmt

<?xml version="1.0" encoding="UTF-8"?>

<activities>

<activity>

<timestamp>2016-09-27 17:55:43.152000</timestamp>

<writer_process_id>6538</writer_process_id>

<writer_process_name>crsd.bin</writer_process_name>

<writer_user>root</writer_user>

<writer_group>root</writer_group>

<writer_hostname>node1</writer_hostname>

<writer_clustername>cluster1-mb1</writer_clustername>

<customer_data>CHECK_RESULTS=-408040060~</customer_data>

<nls_product>CRS</nls_product>

<nls_facility>CRS</nls_facility>

<nls_id>2938</nls_id>

<nls_field_count>1</nls_field_count>

<nls_field1>ora.cvu</nls_field1>

<nls_field1_type>25</nls_field1_type>

<nls_field1_len>0</nls_field1_len>

<nls_format>Resource '%s' has been modified.</nls_format>

<nls_message>Resource 'ora.cvu' has been modified.</nls_message>

<actid>14732093665106538/1816699/1</actid>

<is_planned>1</is_planned>

<onbehalfof_user>grid</onbehalfof_user>

<onbehalfof_hostname>node1</onbehalfof_hostname>

<entity_isoraentity>1</entity_isoraentity>

<entity_type>resource</entity_type>

<entity_name>ora.cvu</entity_name>

<entity_hostname>node1</entity_hostname>

<entity_clustername>cluster1-mb1</entity_clustername>

</activity>

</activities>

To query resource activities for a two-hour interval after a specific time and display the output in XML format:

$ crsctl query calog -aftertime "2016-09-28 17:55:43" -duration "0 02:00:00" -xmlfmt

<?xml version="1.0" encoding="UTF-8"?>

<activities>

<activity>

<timestamp>2016-09-28 17:55:45.992000</timestamp>

<writer_process_id>6538</writer_process_id>

<writer_process_name>crsd.bin</writer_process_name>

<writer_user>root</writer_user>

<writer_group>root</writer_group>

<writer_hostname>node1</writer_hostname>

<writer_clustername>cluster1-mb1</writer_clustername>

D-6


<customer_data>CHECK_RESULTS=1718139884~</customer_data>

<nls_product>CRS</nls_product>

<nls_facility>CRS</nls_facility>

<nls_id>2938</nls_id>

<nls_field_count>1</nls_field_count>

<nls_field1>ora.cvu</nls_field1>

<nls_field1_type>25</nls_field1_type>

<nls_field1_len>0</nls_field1_len>

<nls_format>Resource '%s' has been modified.</nls_format>

<nls_message>Resource 'ora.cvu' has been modified.</nls_message>

<actid>14732093665106538/1942009/1</actid>

<is_planned>1</is_planned>

<onbehalfof_user>grid</onbehalfof_user>

<onbehalfof_hostname>node1</onbehalfof_hostname>

<entity_isoraentity>1</entity_isoraentity>

<entity_type>resource</entity_type>

<entity_name>ora.cvu</entity_name>

<entity_hostname>node1</entity_hostname>

<entity_clustername>cluster1-mb1</entity_clustername>

</activity>

</activities>

To query resource activities at a specific time:

$ crsctl query calog -filter "timestamp=='2016-09-28 17:55:45.992000'"

2016-09-28 17:55:45.992000 : Resource 'ora.cvu' has been modified. :

14732093665106538/1942009/1 :

To query resource activities using filters writer_user

and customer_data

:

$ crsctl query calog -filter "writer_user==root AND customer_data==

'GEN_RESTART@SERVERNAME(node1)=StartCompleted~'" -fullfmt

or

$ crsctl query calog -filter "(writer_user==root) AND (customer_data==

'GEN_RESTART@SERVERNAME(node1)=StartCompleted~')" -fullfmt

----ACTIVITY START---timestamp : 2016-09-15 17:42:57.517000

writer_process_id : 6538 writer_process_name : crsd.bin

writer_user : root writer_group : root writer_hostname : node1 writer_clustername : cluster1-mb1 customer_data : GEN_RESTART@SERVERNAME(rwsbi08)=StartCompleted~ nls_product : CRS nls_facility : CRS nls_id : 2938 nls_field_count : 1 nls_field1 : ora.testdb.db

nls_field1_type : 25 nls_field1_len : 0 nls_format : Resource '%s' has been modified.

nls_message : Resource 'ora.devdb.db' has been modified.

actid : 14732093665106538/659678/1 is_planned : 1 onbehalfof_user : oracle onbehalfof_hostname : node1 entity_isoraentity : 1

D-7

Appendix D crsctl get calog maxsize entity_type : resource entity_name : ora.testdb.db

entity_hostname : node1 entity_clustername : cluster1-mb1

----ACTIVITY END----

To query all the calogs that were generated after UTC+08:00 time "2016-11-15

22:53:08":

$ crsctl query calog -aftertime "2016-11-15 22:53:08+08:00"

To query all the calogs that were generated after UTC-08:00 time "2016-11-15

22:53:08":

$ crsctl query calog -aftertime "2016-11-15 22:53:08-08:00"

To query all the calogs by specifying the timestamp with microseconds:

$ crsctl query calog -aftertime "2016-11-16 01:07:53.063000"

2016-11-16 01:07:53.558000 : Resource 'ora.cvu' has been modified. :

14792791129816600/2580/7 :

2016-11-16 01:07:53.562000 : Clean of 'ora.cvu' on 'rwsam02' succeeded :

14792791129816600/2580/8 :

D.2 crsctl get calog maxsize

To store Oracle Clusterware-managed resource activity information, query the maximum space allotted to the cluster resource activity log.

Syntax


Parameters

The crsctl get calog maxsize

command has no parameters.

Example

The following example returns the maximum space allotted to the cluster resource activity log to store activities:

$ crsctl get calog maxsize

CRS-6760: The maximum size of the Oracle cluster activity log is 1024 MB.

D.3 crsctl get calog retentiontime

Query the retention time of the cluster resource activity log.

Syntax


Parameters

The crsctl get calog retentiontime

command has no parameters.

D-8

Appendix D crsctl set calog maxsize

Examples

The following example returns the retention time of the cluster activity log, in number of hours:

$ crsctl get calog retentiontime

CRS-6781: The retention time of the cluster activity log is 73 hours.

D.4 crsctl set calog maxsize

Configure the maximum amount of space allotted to store Oracle Clusterwaremanaged resource activity information.

Syntax

crsctl set calog maxsize maximum_size

Usage Notes

Specify a value, in MB, for the maximum size of the storage space that you want to allot to the cluster resource activity log.

Note:

If you reduce the amount of storage space, then the contents of the storage are lost.

Example

The following example sets maximum amount of space, to store Oracle Clusterwaremanaged resource activity information, to 1024 MB:

$ crsctl set calog maxsize 1024

D.5 crsctl set calog retentiontime

Configure the retention time of the cluster resource activity log.

Syntax

crsctl set calog retentiontime hours

Parameters

The crsctl set calog retentiontime

command takes a number of hours as a parameter.

Usage Notes

Specify a value, in hours, for the retention time of the cluster resource activity log.

D-9

Appendix D crsctl set calog retentiontime

Examples

The following example sets the retention time of the cluster resource activity log to 72 hours:

$ crsctl set calog retentiontime 72

D-10

E


The Oracle Cluster Health Advisor commands enable the Oracle Grid Infrastructure user to administer basic monitoring functionality on the targets.

•

chactl monitor

Use the chactl monitor

command to start monitoring all the instances of a specific

Oracle Real Application Clusters (Oracle RAC) database using the current set model.

•

chactl unmonitor

Use the chactl unmonitor

command to stop monitoring all the instances of a specific database.

•

chactl status

Use the chactl status

command to check monitoring status of the running targets.

•

chactl config

Use the chactl config

command to list all the targets being monitored, along with the current model of each target.

•

chactl calibrate



•



command to return problems and diagnosis, and suggested corrective actions associated with the problem for specific cluster nodes or Oracle Real Application Clusters (Oracle RAC) databases.

•

chactl query model


command to list all Oracle Cluster Health Advisor models or to view detailed information about a specific Oracle Cluster Health

Advisor model.

•

chactl query repository

Use the chactl query repository

command to view the maximum retention time, number of targets, and the size of the Oracle Cluster Health Advisor repository.

•




•

chactl remove model

Use the chactl remove model

command to delete an Oracle Cluster Health Advisor model along with the calibration data and metadata of the model from the Oracle

Cluster Health Advisor repository.

•

chactl rename model

Use the chactl rename model

command to rename an Oracle Cluster Health

Advisor model in the Oracle Cluster Health Advisor repository.

•

chactl export model

Use the chactl export model

command to export Oracle Cluster Health Advisor models.

E-1

Appendix E chactl monitor

•

chactl import model

Use the chactl import model

command to import Oracle Cluster Health Advisor models.

•

chactl set maxretention

Use the chactl set maxretention

command to set the maximum retention time for the diagnostic data.

•

chactl resize repository

Use the chactl resize repository

command to resize the tablespace of the Oracle

Cluster Health Advisor repository based on the current retention time and the number of targets.

E.1 chactl monitor

Use the chactl monitor

command to start monitoring all the instances of a specific

Oracle Real Application Clusters (Oracle RAC) database using the current set model.

Oracle Cluster Health Advisor monitors all instances of this database using the same model assigned to the database.

Oracle Cluster Health Advisor uses Oracle-supplied gold model when you start monitoring a target for the first time. Oracle Cluster Health Advisor stores monitoring status of the target in the internal store. Oracle Cluster Health Advisor starts monitoring any new database instance when Oracle Cluster Health Advisor detects or redetects the new instance.

Syntax

chactl monitor database -db db_unique_name [-model model_name [-force]][-help] chactl monitor cluster [-model model_name [-force]]

Parameters

Table E-1 chactl monitor Command Parameters

Parameter

db_unique_name model_name force

Description

Specify the name of the database.

Specify the name of the model.

Use the

-force

option to monitor with the specified model without stopping monitoring the target.

Without the

-force

option, run chactl unmonitor

first, and then chactl monitor

with the model name.

Examples

• To monitor the

SalesDB

database using the

BlkFridayShopping

default model:

$ chactl monitor database –db SalesDB -model BlkFridayShopping

• To monitor the

InventoryDB

database using the

Nov2014

model:

$ chactl monitor database –db InventoryDB -model Nov2014

E-2

Appendix E chactl unmonitor

If you specify the

model_name

, then Oracle Cluster Health Advisor starts monitoring with the specified model and stores the model in the Oracle Cluster Health Advisor internal store.

If you use both the

–model

and

–force

options, then Oracle Cluster Health Advisor stops monitoring and restarts monitoring with the specified model.

• To monitor the

SalesDB

database using the

Dec2014

model:

$ chactl monitor database –db SalesDB –model Dec2014

• To monitor the

InventoryDB

database using the

Dec2014

model and the

-force option:

$ chactl monitor database –db InventoryDB –model Dec2014 -force

Error Messages

Error: no CHA resource is running in the cluster.

Description: Returns when there is no hub or leaf node running the Oracle Cluster

Health Advisor service.

Error: the database is not configured.

Description: Returns when the database is not found in either the Oracle Cluster

Health Advisor configuration repository or as a CRS resource.

Error: input string “xc#? %” is invalid.

Description: Returns when the command-line cannot be parsed. Also displays the top-level help text.

Error:

CHA is already monitoring target <dbname>.

Description: Returns when the database is already monitored.

E.2 chactl unmonitor

Use the chactl unmonitor

command to stop monitoring all the instances of a specific database.

Syntax

chactl unmonitor database -db db_unique_name [-help]

Examples

To stop monitoring the SalesDB database:

$ chactl unmonitor database –db SalesDB

Database SalesDB is not monitored

E.3 chactl status

Use the chactl status

command to check monitoring status of the running targets.

If you do not specify any parameters, then the chactl status

command returns the status of all running targets.

E-3

Appendix E chactl status

The monitoring status of an Oracle Cluster Health Advisor target can be either

Monitoring

or

Not Monitoring

. The chactl status

command shows four types of results and depends on whether you specify a target and

-verbose

option.

The

-verbose

option of the command also displays the monitoring status of targets contained within the specified target and the names of executing models of each printed target. The chactl status

command displays targets with positive monitoring status only. The chactl status

command displays negative monitoring status only when the corresponding target is explicitly specified on the command-line.

Syntax

chactl status {cluster|database [-db db_unique_name]} [-verbose][-help]

Examples

• To display the list of cluster nodes and databases being monitored:

#chactl status

Monitoring nodes rac1Node1, rac1Node2

Monitoring databases SalesDB, HRdb

Note:

A database is displayed with Monitoring status, if Oracle Cluster Health

Advisor is monitoring one or more of the instances of the database, even if some of the instances of the database are not running.

• To display the status of Oracle Cluster Health Advisor:

$ chactl status

Cluster Health Advisor service is offline.

No target or the

-verbose

option is specified on the command-line. Oracle Cluster

Health Advisor is not running on any node of the cluster.

• To display various Oracle Cluster Health Advisor monitoring states for cluster nodes and databases:

$ chactl status database -db SalesDB

Monitoring database SalesDB

$ chactl status database -db bogusDB

Not Monitoring database bogusDB

$ chactl status cluster

Monitoring nodes rac1,rac2

Not Monitoring node rac3

or

$ chactl status cluster

Cluster Health Advisor is offline

• To display the detailed Oracle Cluster Health Advisor monitoring status for the entire cluster:

$ chactl status –verbose

Monitoring node(s) racNd1, racNd2, racNd3, racNd4 using model MidSparc

E-4

Appendix E chactl config

Monitoring database HRdb2, Instances HRdb2I1, HRdb2I2 in server pool SilverPool using model M6

Monitoring database HRdb, Instances HRdbI4, HRdbI6 in server pool SilverPool using model M23

Monitoring database testHR, Instances inst3 on node racN7 using model TestM13

Monitoring database testHR, Instances inst4 on node racN8 using model TestM14

When the target is not specified and the

–verbose

option is specified, the chactl status

command displays the status of the database instances and names of the models.

E.4 chactl config

Use the chactl config

command to list all the targets being monitored, along with the current model of each target.

If the specified target is a multitenant container database (CDB) or a cluster, then the chactl config

command also displays the configuration data status.

Syntax

chactl config {cluster|database -db db_unique_name}[-help]

Examples

To display the monitor configuration and the specified model of each target:

$ chactl config

Databases monitored: prodDB, hrDB

$ chactl config database –db prodDB

Monitor: Enabled

Model: GoldDB

$ chactl config cluster

Monitor: Enabled

Model: DEFAULT_CLUSTER

E.5 chactl calibrate



The user-generated models are effective for Oracle Real Application Clusters (Oracle

RAC) monitored systems in your operating environment as the user-generated models use calibration data from the target. Oracle Cluster Health Advisor adds the usergenerated model to the list of available models and stores the new model in the Oracle


If a model with the same name exists, then overwrite the old model with the new one by using the

-force

option.

Key Performance and Workload Indicators

A set of metrics or Key Performance Indicators describe high-level constraints to the training data selected for calibration. This set consists of relevant metrics to describe

E-5

Appendix E chactl calibrate performance goals and resource utilization bandwidth, for example, response times or

CPU utilization.

The Key Performance Indicators are also operating system and database signals which are monitored, estimated, and associated with fault detection logic. Most of these Key Performance Indicators are also either predictors, that is, their state is correlated with the state of other signals, or predicted by other signals. The fact that the Key Performance Indicators correlate with other signals makes them useful as filters for the training or calibration data.

The Key Performance Indicators ranges are used in the query calibrate

and calibrate commands to filter out data points.

The following Key Performance Indicators are supported for database:

•

CPUPERCENT

- CPU utilization - Percent

•

IOREAD

- Disk read - Mbyte/sec

•

DBTIMEPERCALL

- Database time per user call - usec/call

•

IOWRITE

- Disk write - Mbyte/sec

•

IOTHROUGHPUT

- Disk throughput - IO/sec

The following Key Performance Indicators are supported for cluster:

•

CPUPERCENT


•

IOREAD


•

IOWRITE


•

IOTHROUGHPUT


Syntax

chactl calibrate {cluster|database -db db_unique_name} -model model_name

[-force] [-timeranges 'start=time_stamp,end=time_stamp,...']

[-kpiset 'name=kpi_name min=val max=val,...' ][-help]

Specify timestamp in the


format.

Examples

chactl calibrate database -db oracle -model weekday

-timeranges 'start=start=2016-09-09 16:00:00,end=2016-09-09 23:00:00' chactl calibrate database -db oracle -model weekday

-timeranges 'start=start=2016-09-09 16:00:00,end=2016-09-09 23:00:00'

-kpiset 'name=CPUPERCENT min=10 max=60'

Error Messages

Error: input string “xc#? %” is misconstructed

Description: Confirm if the given model name exists with

Warning: model_name already exists, please use [-force]

message.

Error:

start_time and/or end_time are misconstructed

Description: Input time specifiers are badly constructed.

E-6

Appendix E chactl query diagnosis

Error: no sufficient calibration data exists for the specified period, please reselect another period

Description: Evaluator couldn’t find enough calibration data.

E.6 chactl query diagnosis


command to return problems and diagnosis, and suggested corrective actions associated with the problem for specific cluster nodes or

Oracle Real Application Clusters (Oracle RAC) databases.

Syntax

chactl query diagnosis [-cluster|-db db_unique_name] [-start time -end time] [htmlfile file_name][-help]



format.

In the preceding syntax, you must consider the following points:

• If you do not provide any options, then the chactl query diagnosis

command returns the current state of all monitored nodes and databases. The chactl query diagnosis

command reports general state of the targets, for example, ABNORMAL by showing their diagnostic identifier, for example,

Storage Bandwidth Saturation

.

This is a quick way to check for any ABNORMAL state in a database or cluster.

• If you provide a time option after the target name, then the chactl query diagnosis command returns the state of the specified target restricted to the conditions in the time interval specified. The compressed time series lists the identifiers of the causes for distinct incidents which occurred in the time interval, its start and end time.

• If an incident and cause recur in a specific time interval, then the problem is reported only once. The start time is the start time of the first occurrence of the incident and the end time is the end time of the last occurrence of the incident in the particular time interval.

• If you specify the

–db

option without a database name, then the chactl query diagnosis

command displays diagnostic information for all databases. However, if a database name is specified, then the chactl query diagnosis

command displays diagnostic information for all instances of the database that are being monitored.

• If you specify the

–cluster

option without a host name, then the chactl query diagnosis

command displays diagnostic information for all hosts in that cluster.

• If you do not specify a time interval, then the chactl query diagnosis

command displays only the current issues for all or the specified targets. The chactl query diagnosis

command does not display the frequency statistics explicitly. However, you can count the number of normal and abnormal events that occurred in a target in the last 24 hours.

• If no incidents have occurred during the specified time interval, then the chactl query diagnosis

command returns a text message, for example,

Database/host is operating NORMALLY

, or no incidents were found

.

• If the state of a target is NORMAL, the command does not report it. The chactl query diagnosis

command reports only the targets with ABNORMAL state for the specified time interval.

Output parameters:

E-7

Appendix E chactl query diagnosis

• Incident start Time

• Incident end time (only for the default database and/or host, non-verbose output)

• Target (for example, database, host)

• Problem

Description: Detailed description of the problem

Cause: Root cause of the problem and contributing factors

• Action: an action that corrects the abnormal state covered in the diagnosis

Reporting Format: The diagnostic information is displayed in a time compressed or time series order, grouped by components.

Examples

To display diagnostic information of a database for a specific time interval:

$ chactl query diagnosis -db oltpacdb -start "2016-02-01 02:52:50.0" -end

"2016-02-01 03:19:15.0"









Problem: DB Control File IO Performance

Description: CHA has detected that reads or writes to the control files are slower than expected.

Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO.

The slow control file reads and writes may have an impact on checkpoint and Log

Writer (LGWR) performance.

Action: Separate the control files from other database files and move them to faster disks or Solid State Devices.

Problem: DB CPU Utilization

Description: CHA detected larger than expected CPU utilization for this database.

Cause: The Cluster Health Advisor (CHA) detected an increase in database CPU utilization because of an increase in the database workload.

Action: Identify the CPU intensive queries by using the Automatic Diagnostic and

Defect Manager (ADDM) and follow the recommendations given there. Limit the number of CPU intensive queries or relocate sessions to less busymachines. Add CPUs if the CPU capacity is insufficent to support the load without a performance degradation or effects on other databases.

Problem: DB Log File Switch

Description: CHA detected that database sessions are waiting longer than expected for log switch completions.

Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently.

Action: Increase the size of the redo logs.

E-8

Appendix E chactl query model

Error Message

Message:

Target is operating normally

Description: No incidents are found on the target.

Message:

No data was found for active Target

Description: No data was found, but the target was operating or active at the time of the query.

Message:

Target is not active or was not being monitored.

Description: No data was found because the target was not monitored at the time of the query.

E.7 chactl query model


command to list all Oracle Cluster Health Advisor models or to view detailed information about a specific Oracle Cluster Health Advisor model.

Syntax

chactl query model [-name model_name [-verbose]][-help]

Examples

• To list all base Oracle Cluster Health Advisor models:

$ chactl query model

Models: MOD1, MOD2, MOD3, MOD4, MOD5, MOD6, MOD7

$ chactl query model -name weekday

Model: weekday

Target Type: DATABASE

Version: 12.2.0.1_0

OS Calibrated on: Linux amd64

Calibration Target Name: prod

Calibration Date: 2016-09-10 12:59:49

Calibration Time Ranges: start=2016-09-09 16:00:00,end=2016-09-09 23:00:00

Calibration KPIs: not specified

• To view detailed information, including calibration metadata, about the specific

Oracle Cluster Health Advisor model:

$ chactl query model -name MOD5 -verbose

Model: MOD5

CREATION_DATE: Jan 10,2016 10:10

VALIDATION_STATUS: Validated

DATA_FROM_TARGET : inst72, inst75

USED_IN_TARGET : inst76, inst75, prodDB, evalDB-evalSP

CAL_DATA_FROM_DATE: Jan 05,2016 10:00

CAL_DATA_TO_DATE: Jan 07,2016 13:00

CAL_DATA_FROM_TARGETS inst73, inst75

...

E-9

Appendix E chactl query repository

E.8 chactl query repository

Use the chactl query repository

command to view the maximum retention time, number of targets, and the size of the Oracle Cluster Health Advisor repository.

Syntax

chactl query repository [-help]

Examples

To view information about the Oracle Cluster Health Advisor repository:

$ chactl query repository specified max retention time(hrs) : 72 available retention time(hrs) : 212 available number of entities : 2 allocated number of entities : 0 total repository size(gb) : 2.00

allocated repository size(gb) : 0.07

E.9 chactl query calibration



Syntax

chactl query calibration {-cluster|-db db_unique_name} [-timeranges

'start=time_stamp,end=time_stamp,...'] [-kpiset 'name=kpi_name min=val max=val,...' ] [-interval val][-help]

Specify the interval in hours.



format.

Note:

If you do not specify a time interval, then the chactl query calibration command displays all the calibration data collected for a specific target.

The following Key Performance Indicators are supported for database:

•

CPUPERCENT


•

IOREAD


•

DBTIMEPERCALL

- Database time per user call - usec/call

•

IOWRITE


•

IOTHROUGHPUT


The following Key Performance Indicators are supported for cluster:

•

CPUPERCENT


E-10

Appendix E chactl query calibration

•

IOREAD


•

IOWRITE


•

IOTHROUGHPUT


Examples

To view detailed information about the calibration data of the specified target:

$ chactl query calibration -db oltpacdb -timeranges

'start=2016-07-26 01:00:00,end=2016-07-26 02:00:00,start=2016-07-26

03:00:00,end=2016-07-26 04:00:00'

-kpiset 'name=CPUPERCENT min=20 max=40, name=IOTHROUGHPUT min=500 max=9000' interval 2


Start time : 2016-07-26 01:03:10

End time : 2016-07-26 01:57:25

Total Samples : 120





4.96 0.20 8.98 0.06 25.68

<25 <50 <75 <100 >=100

97.50% 2.50% 0.00% 0.00% 0.00%



27.73 9.72 31.75 4.16 109.39

<50 <100 <150 <200 >=200

73.33% 22.50% 4.17% 0.00% 0.00%



2407.50 1500.00 1978.55 700.00 7800.00

<5000 <10000 <15000 <20000 >=20000

83.33% 16.67% 0.00% 0.00% 0.00%



21.99 21.75 1.36 20.00 26.80

<20 <40 <60 <80 >=80

0.00% 100.00% 0.00% 0.00% 0.00%



267.39 264.87 32.05 205.80 484.57

<10000000 <20000000 <30000000 <40000000 <50000000 <60000000 <70000000

>=70000000

E-11

Appendix E chactl query calibration

100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%


Start time : 2016-07-26 03:00:00

End time : 2016-07-26 03:53:30

Total Samples : 342





12.18 0.28 16.07 0.05 60.98

<25 <50 <75 <100 >=100

64.33% 34.50% 1.17% 0.00% 0.00%



57.57 51.14 34.12 16.10 135.29

<50 <100 <150 <200 >=200

49.12% 38.30% 12.57% 0.00% 0.00%



5048.83 4300.00 1730.17 2700.00 9000.00

<5000 <10000 <15000 <20000 >=20000

63.74% 36.26% 0.00% 0.00% 0.00%



23.10 22.80 1.88 20.00 31.40

<20 <40 <60 <80 >=80

0.00% 100.00% 0.00% 0.00% 0.00%



744.39 256.47 2892.71 211.45 45438.35

<10000000 <20000000 <30000000 <40000000 <50000000 <60000000 <70000000

>=70000000

100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

E-12

Appendix E chactl remove model

E.10 chactl remove model

Use the chactl remove model

command to delete an Oracle Cluster Health Advisor model along with the calibration data and metadata of the model from the Oracle


Note:

If the model is being used to monitor the targets, then the chactl remove model command cannot delete any model.

Syntax

chactl remove model -name model_name [-help]

Error Message

Error:

model_name does not exist

Description: The specified Oracle Cluster Health Advisor model does not exist in the

Oracle Cluster Health Advisor repository.

E.11 chactl rename model

Use the chactl rename model

command to rename an Oracle Cluster Health Advisor model in the Oracle Cluster Health Advisor repository.

Assign a descriptive and unique name to the model. Oracle Cluster Health Advisor preserves all the links related to the renamed model.

Syntax

chactl rename model -from model_name -to model_name [-help]

Error Messages

Error:

model_name does not exist

Description: The specified model name does not exist in the Oracle Cluster Health

Advisor repository.

Error:

dest_name already exist

Description: The specified model name already exists in the Oracle Cluster Health

Advisor repository.

E.12 chactl export model

Use the chactl export model

command to export Oracle Cluster Health Advisor models.

E-13

Appendix E chactl import model

Syntax

chactl export model -name model_name -file output_file [-help]

Example

$ chactl export model -name weekday -file /tmp//weekday.mod

E.13 chactl import model

Use the chactl import model

command to import Oracle Cluster Health Advisor models.

Syntax

chactl import model -name model_name -file model_file [-force] [-help]

While importing, if there is an existing model with the same name as the model being imported, then use the

-force

option to overwrite.

Example E-1 Example

$ chactl import model -name weekday -file /tmp//weekday.mod

E.14 chactl set maxretention

Use the chactl set maxretention

command to set the maximum retention time for the diagnostic data.

The default and minimum retention time is 72 hours. If the Oracle Cluster Health

Advisor repository does not have enough space, then the retention time is decreased for all the targets.

Note:

Oracle Cluster Health Advisor stops monitoring if the retention time is less than 24 hours.

Syntax

chactl set maxretention -time retention_time [-help]

Specify the retention time in hours.

Examples

To set the maximum retention time to 80 hours:

$ chactl set maxretention -time 80 max retention successfully set to 80 hours

Error Message

Error:

Specified time is smaller than the allowed minimum

E-14

Appendix E chactl resize repository

Description: This message is returned if the input value for maximum retention time is smaller than the minimum value.

E.15 chactl resize repository

Use the chactl resize repository

command to resize the tablespace of the Oracle

Cluster Health Advisor repository based on the current retention time and the number of targets.

Note:

The chactl resize repository

command fails if your system does not have enough free disk space or if the tablespace contains data beyond requested resize value.

Syntax

chactl resize repository -entities total number of hosts and database instances [force | -eval] [-help]

Examples

To set the number of targets in the tablespace to 32: chactl resize repository -entities 32 repository successfully resized for 32 targets

E-15

F


The Trace File Analyzer control utility, TFACTL, is the command-line interface for

Oracle Trace File Analyzer.

TFACTL provides a command-line and shell interface to Oracle Trace File Analyzer commands for:

• Administration

• Summary and analysis

• Diagnostic collection

The tfactl

commands that you can run depends on your access level.

• You need root

access or sudo

access to tfactl

to run administration commands.

• Run a subset of commands as:

– An Oracle Database home owner or Oracle Grid Infrastructure home owner

– A member of

OS DBA

or

ASM

groups

You gain access to summary, analysis, and diagnostic collection functionality by running the commands as an Oracle Database home owner or Oracle Grid

Infrastructure home owner.

To grant other users access to tfactl

: tfactl access

To use tfactl

as a command-line tool: tfactl command [options]

To use tfactl

as a shell interface: tfactl

Once the shell starts enter commands as needed.

$ tfactl tfactl>

Append the

-help

option to any of the tfactl

commands to obtain command-specific help.

$ tfactl command -help

•


You need root

access to tfactl

, or sudo

access to run all administration commands.

F-1

Appendix F


•


Use these commands to view the summary of deployment and status of Oracle

Trace File Analyzer, and changes and events detected by Oracle Trace File

Analyzer.

•


Run the diagnostic collection commands to collect diagnostic data.

F.1 Running Administration Commands

You need root

access to tfactl

, or sudo

access to run all administration commands.

Table F-1 Basic TFACTL commands

Command

tfactl start tfactl stop tfactl enable tfactl disable tfactl uninstall tfactl syncnodes tfactl restrictprotocol tfactl status

Description

Starts the Oracle Trace File Analyzer daemon on the local node.

Stops the Oracle Trace File Analyzer daemon on the local node.

Enables automatic restart of the Oracle Trace File

Analyzer daemon after a failure or system reboot.

Stops any running Oracle Trace File Analyzer daemon and disables automatic restart.

Removes Oracle Trace File Analyzer from the local node.

Generates and copies Oracle Trace File Analyzer certificates from one Oracle Trace File Analyzer node to other nodes.

Restricts the use of certain protocols.

Checks the status of an Oracle Trace File Analyzer process.

The output is same as tfactl print status

.

•

tfactl diagnosetfa

Use the tfactl diagnosetfa

command to collect Oracle Trace File Analyzer diagnostic data from the local node to identify issues with Oracle Trace File

Analyzer.

•

tfactl host

Use the tfactl host

command to add hosts to, or remove hosts from the Oracle

Trace File Analyzer configuration.

•

tfactl set

Use the tfactl set

command to enable or disable, or modify various Oracle Trace

File Analyzer functions.

•

tfactl access


command to allow non-root users to have controlled access to Oracle Trace File Analyzer, and to run diagnostic collections.

F-2

Appendix F


F.1.1 tfactl diagnosetfa

Use the tfactl diagnosetfa

command to collect Oracle Trace File Analyzer diagnostic data from the local node to identify issues with Oracle Trace File Analyzer.

Syntax

tfactl diagnosetfa [-repo repository] [-tag tag_name] [-local]

Parameters

Table F-2 tfactl diagnosetfa Command Parameters

Parameter

-repo repository

-tag tag_name

-local

Description

Specify the repository directory for Oracle Trace File Analyzer diagnostic collections.

Oracle Trace File Analyzer collects the files into tag_name

directory.

Runs Oracle Trace File Analyzer diagnostics only on the local node.

F.1.2 tfactl host

Use the tfactl host

command to add hosts to, or remove hosts from the Oracle Trace

File Analyzer configuration.

Syntax

tfactl host [add host_name | remove host_name]

Specify a host name to add or remove, as in the following example:

$ tfactl host add myhost.example.com

Usage Notes

View the current list of hosts in the Oracle Trace File Analyzer configuration using the tfactl print hosts

command. The tfactl print hosts

command lists the hosts that are part of the Oracle Trace File Analyzer cluster:

$ tfactl print hosts

Host Name : node1

Host Name : node2

When you add a new host, Oracle Trace File Analyzer contacts the Oracle Trace File

Analyzer instance on the other host. Oracle Trace File Analyzer authenticates the new host using certificates and both the Oracle Trace File Analyzer instances synchronize their respective hosts lists. Oracle Trace File Analyzer does not add the new host until the certificates are synchronized.

After you successfully add a host, all the cluster-wide commands are activated on all nodes registered in the Berkeley database.

F-3

Appendix F


F.1.3 tfactl set

Use the tfactl set

command to enable or disable, or modify various Oracle Trace File

Analyzer functions.

Syntax

tfactl set [autodiagcollect=ON | OFF] [cookie=UID] [autopurge=ON | OFF]

[minagetopurge=n]

[trimfiles=ON | OFF] [tracelevel=COLLECT | SCAN | INVENTORY | OTHER:1 | 2 | 3 | 4]

[manageLogsAutoPurge=ON | OFF] [manageLogsAutoPurgePolicyAge=nd|h]

[manageLogsAutoPurgeInterval=minutes] [diskUsageMon=ON|OFF]

[diskUsageMonInterval=minutes] [reposizeMB=number]

[repositorydir=directory] [logsize=n [-local]] [logcount=n

[-local]] [-c]

Parameters

Table F-3 tfactl set Command Parameters

Parameter

autodiagcollect=ON |

OFF autopurge minagetopurge=n trimfiles=ON | OFF tracelevel=COLLECT |

SCAN | INVENTORY |

OTHER: 1 | 2 | 3 | 4 diskUsageMon=ON|OFF diskUsageMonInterval=m

inutes

Description

When set to

OFF

(default) automatic diagnostic collection is disabled. If set to

ON

, then Oracle Trace File Analyzer automatically collects diagnostics when certain patterns occur while Oracle Trace

File Analyzer scans the alert logs.

To set automatic collection for all nodes of the Oracle Trace File

Analyzer cluster, you must specify the

-c

parameter.

When set to

ON

, enables automatic purging of collections when

Oracle Trace File Analyzer observes less space in the repository

(default is

ON

).

Set the minimum age, in hours, for a collection before Oracle Trace

File Analyzer considers it for purging (default is 12 hours).

When set to

ON

, Oracle Trace File Analyzer trims the files to have only the relevant data when diagnostic collection is done as part of a scan.

Note: When using tfactl diagcollect

, you determine the time range for trimming with the parameters you specify. Oracle recommends that you not set this parameter to

OFF

, because untrimmed data can consume much space.

You can set trace levels for certain operations, including

INVENTORY:n

,

SCAN:n

,

COLLECT:n

,

OTHER:n

. In this syntax,

n

is a number from 1 to 4 and

OTHER

includes all messages not relevant to the first three components.

Note: Do not change the tracing level unless you are directed to do so by My Oracle Support.

Turns ON (default) or OFF monitoring disk usage and recording snapshots.

Oracle Trace File Analyzer stores the snapshots under tfa/ repository/suptools/node/managelogs/ usage_snapshot/

.

Specify the time interval between snapshots (60 minutes by default).

F-4

Appendix F


Table F-3 (Cont.) tfactl set Command Parameters

Parameter

manageLogsAutoPurge=ON

| OFF

Description

Turns automatic purging on or off (ON by default in DSC and OFF by default elsewhere).

Age of logs to be purged (30 days by default).

manageLogsAutoPurgePol icyAge=nd|h manageLogsAutoPurgeInt erval=minutes

Specify the purge frequency (default is 60 minutes).

reposizeMB=number repositorydir=director

y

Sets the maximum size, in MB, of the collection repository.

Specify the collection repository directory.

logsize=n [-local] logcount=n [-local]

-c

Sets the maximum size, in MB, of each log before Oracle Trace

File Analyzer rotates to a new log (default is 50 MB). Use the

local

parameter to apply the change only to the local node.

Sets the maximum number of logs of specified size that Oracle

Trace File Analyzer retains (default is 10). Use the

-local parameter to apply the change only to the local node.

Propagates the settings to all nodes in the Oracle Trace File


Example

The following example enables automatic diagnostic collection, sets the trace level, and sets a maximum limit for the collection repository:

$ tfactl set autodiagcollect=ON reposizeMB=20480

F.1.4 tfactl access


command to allow non-root users to have controlled access to

Oracle Trace File Analyzer, and to run diagnostic collections.

Non-root users can run a subset of tfactl

commands. Running a subset of commands enables non-root users to have controlled access to Oracle Trace File Analyzer, and to run diagnostic collections. However, root

access is still required to install and administer Oracle Trace File Analyzer. Control non-root users and groups using the tfactl access

command. Add or remove non-root users and groups depending upon your business requirements.

Note:

By default, all Oracle home owners, OS DBA groups, and ASM groups are added to the Oracle Trace File Analyzer Access Manager list while installing or upgrading Oracle Trace File Analyzer.

Syntax

tfactl access [ lsusers | add -user user_name [ -group group_name ]

[ -local ] | remove -user user_name [ -group group_name ]

F-5

Appendix F


[ -all ] [ -local ] | block -user user_name [ -local ] | unblock -user user_name

[-local] | enable [ -local ] | disable [ -local ] | reset [ -local ] | removeall [ local ]

Parameters

Table F-4 tfactl access Command Parameters

Parameter

lsusers enable disable add remove block unblock reset removeall

Description

Lists all the Oracle Trace File Analyzer users and groups.

Enables Oracle Trace File Analyzer access for non-root users.

Use the

–local

flag to change settings only on the local node.

Disables Oracle Trace File Analyzer access for non-root users.

However, the list of users who were granted access to Oracle

Trace File Analyzer is stored, if the access to non-root users is enabled later.

Use the

–local

flag to change settings only on the local node.

Adds a user or a group to the Oracle Trace File Analyzer access list.

Removes a user or a group from the Oracle Trace File Analyzer access list.

Blocks Oracle Trace File Analyzer access for non-root user.

Use this command to block a specific user even though the user is a member of a group that is granted access to Oracle Trace File

Analyzer.

Enables Oracle Trace File Analyzer access for non-root users who were blocked earlier.

Use this command to unblock a user that was blocked earlier by running the command tfactl access block

.

Resets to the default access list that includes all Oracle Home owners and DBA groups.

Removes all Oracle Trace File Analyzer users and groups.

Remove all users from the Oracle Trace File Analyzer access list including the default users and groups.

Examples

To add a user, for example, abc to the Oracle Trace File Analyzer access list and enable access to Oracle Trace File Analyzer across cluster.

/u01/app/tfa/bin/tfactl access add -user abc

To add all members of a group, for example, xyz to the Oracle Trace File Analyzer access list and enable access to Oracle Trace File Analyzer on the localhost.

/u01/app/tfa/bin/tfactl access add -group xyz -local

To remove a user, for example, abc from the Oracle Trace File Analyzer access list.

/u01/app/tfa/bin/tfactl access remove -user abc

To block a user, for example, xyz from accessing Oracle Trace File Analyzer.

/u01/app/tfa/bin/tfactl access block -user xyz

F-6

Appendix F


To remove all Oracle Trace File Analyzer users and groups.

/u01/app/tfa/bin/tfactl access removeall

F.2 Running Summary and Analysis Commands

Use these commands to view the summary of deployment and status of Oracle Trace

File Analyzer, and changes and events detected by Oracle Trace File Analyzer.

•

tfactl summary

Use the tfactl summary

command to view the summary of Oracle Trace File

Analyzer deployment.

•

tfactl changes

Use the tfactl changes

command to view the changes detected by Oracle Trace

File Analyzer.

•

tfactl events

Use the tfactl events

command to view the events detected by Oracle Trace File

Analyzer.

•

tfactl analyze



•

tfactl run

Use the tfactl run

command to run the requested action (can be inventory or scan or any support tool).

•

tfactl toolstatus

Use the tfactl toolstatus

command to view the status of Oracle Trace File

Analyzer Support Tools across all nodes.

F.2.1 tfactl summary

Use the tfactl summary

command to view the summary of Oracle Trace File Analyzer deployment.

Syntax

tfactl summary

Example

$ tfactl summary

Output from host : myserver69

------------------------------

=====

Nodes

===== myserver69 myserver70 myserver71

=====

Homes

=====

.------------------------------------------------------------------------------------

F-7

Appendix F


-------------------------------------.

| Home | Type | Version |

Database | Instance | Patches |

+------------------------------------------------+------+------------

+-------------------+----------------------+---------+

| /scratch/app/11.2.0.4/grid | GI | 11.2.0.4.0

| | | |

| /scratch/app/oradb/product/11.2.0/dbhome_11204 | DB | 11.2.0.4.0 | apxcmupg,rdb11204 | apxcmupg_1,rdb112041 | |

'------------------------------------------------+------+------------

+-------------------+----------------------+---------'


------------------------------

=====

Homes

=====

.------------------------------------------------------------------------------------

--------------------------.



+------------------------------------------------+------+------------

+-------------------+-----------+---------+


| | | |

| /scratch/app/oradb/product/11.2.0/dbhome_11204 | DB | 11.2.0.4.0 | apxcmupg,rdb11204 | rdb112042 | |

'------------------------------------------------+------+------------

+-------------------+-----------+---------'


------------------------------

=====

Homes

=====

.------------------------------------------------------------------------------------

--------------------------.



+------------------------------------------------+------+------------

+-------------------+-----------+---------+


| | | |

| /scratch/app/oradb/product/11.2.0/dbhome_11204 | DB | 11.2.0.4.0 | apxcmupg,rdb11204 | rdb112043 | |

'------------------------------------------------+------+------------

+-------------------+-----------+---------'

F.2.2 tfactl changes

Use the tfactl changes

command to view the changes detected by Oracle Trace File

Analyzer.

Syntax

tfactl changes

F-8

Appendix F


Example

$ tfactl changes


------------------------------


------------------------------

Jul/26/2016 10:20:35 : Parameter 'sunrpc.transports' value changed : tcp 1048576 => udp 32768

Jul/26/2016 10:20:35 : Parameter 'sunrpc.transports' value changed : tcp 1048576 => tcp-bc 1048576


------------------------------

Jul/26/2016 10:21:06 : Parameter 'sunrpc.transports' value changed : tcp 1048576 => udp 32768

Jul/26/2016 10:21:06 : Parameter 'sunrpc.transports' value changed : tcp 1048576 => tcp-bc 1048576

-bash-4.1# tfactl analyze

INFO: analyzing all (Alert and Unix System Logs) logs for the last 60 minutes...

Please wait...

INFO: analyzing host: myserver69

Report title: Analysis of Alert,System Logs

Report date range: last ~1 hour(s)

Report (default) time zone: UTC - Coordinated Universal Time

Analysis started at: 26-Jul-2016 10:36:03 AM UTC

Elapsed analysis time: 1 second(s).

Configuration file: /scratch/app/11.2.0.4/grid/tfa/myserver69/ tfa_home/ext/tnt/conf/tnt.prop

Configuration group: all

Total message count: 15,261, from 20-Nov-2015 02:06:21 AM

UTC to 26-Jul-2016 10:10:58 AM UTC

Messages matching last ~1 hour(s): 1, from 26-Jul-2016 10:10:58 AM

UTC to 26-Jul-2016 10:10:58 AM UTC

last ~1 hour(s) error count: 0 last ~1 hour(s) ignored error count: 0

last ~1 hour(s) unique error count: 0

Message types for last ~1 hour(s)

Occurrences percent server name type

----------- ------- -------------------- -----

1 100.0% myserver69 generic

----------- -------

1 100.0%

Unique error messages for last ~1 hour(s)

Occurrences percent server name error

----------- ------- -------------------- -----

----------- -------

0 100.0%

F-9

Appendix F


F.2.3 tfactl events

Use the tfactl events

command to view the events detected by Oracle Trace File

Analyzer.

Syntax

tfactl events

Example

$ tfactl events


------------------------------

Jul/25/2016 06:25:33 :

[crs.myserver69] : [cssd(7513)]CRS-1603:CSSD on node myserver69 shutdown by user.

Jul/25/2016 06:32:41 :

[crs.myserver69] : [cssd(5794)]CRS-1601:CSSD Reconfiguration complete.

Active nodes are myserver69 myserver70 myserver71 .

Jul/25/2016 06:47:37 :

[crs.myserver69] : [/scratch/app/11.2.0.4/grid/bin/scriptagent.bin(16233)]

CRS-5818:Aborted command 'start' for resource 'ora.oc4j'. Details at (:CRSAGF00113:)

{1:32892:193} in /scratch/app/11.2.0.4/grid/log/myserver69/agent/crsd/ scriptagent_oragrid/scriptagent_oragrid.log.

Jul/25/2016 06:24:43 :

[db.apxcmupg.apxcmupg_1] : Instance terminated by USER, pid = 21581

Jul/25/2016 06:24:43 :

[db.rdb11204.rdb112041] : Instance terminated by USER, pid = 18683

Jul/25/2016 06:24:44 :

[db.+ASM1] : ORA-15032: not all alterations performed

[db.+ASM1] : ORA-15001: diskgroup "FRA" does not exist or is not mounted










[db.+ASM1] : ORA-15001: diskgroup "DATA" does not exist or is not mounted









Jul/25/2016 06:24:53 :


[db.+ASM1] : ORA-15027: active use of diskgroup "VDATA" precludes its dismount

Jul/25/2016 06:25:22 :

[db.+ASM1] : Shutting down instance (immediate)

[db.+ASM1] : Shutting down instance: further logons disabled

Summary :

F-10

Appendix F


=========

INFO : 2

ERROR : 26

WARNING : 1

F.2.4 tfactl analyze



Filter the output of the command by component, error type, and time.

With the tfactl analyze

command, you can choose from the following types of log file analysis:

• Show the most common messages within the logs: This analysis provides a quick indication of where larger issues are occurring. Oracle Trace File Analyzer takes important messages out of the alert logs and strips the extraneous information from the log messages, organizes the most commonly occurring messages, and displays them in the order from most common to least common.

By default, Oracle Trace File Analyzer analyzes error messages, but you can specify a particular type of message for analysis.

• Search for text within log messages: This is similar to using the grep

utility to search, only faster because Oracle Trace File Analyzer checks the time of each message and only shows those matching the last x number of minutes or any interval of time.

• Analyze the Oracle OSWatcher log statistics: Oracle Trace File Analyzer reads the various statistics available in the

OSWatcher

log files and provides detailed analysis showing first, highest, lowest, average, and the last three readings of each statistic. Choose any interval down to a specific minute or second. Oracle

Trace File Analyzer optionally provides the original data from the

OSWatcher

logs for each value reported on (data point).

Syntax

tfactl analyze [-search "pattern"] [-comp db | asm | crs | acfs | os | osw | oswslabinfo | all]

[-type error | warning | generic] [-last nh|d]

[-from "MMM/DD/YYYY HH24:MI:SS"] [-to "MMM/DD/YYYY HH24:MI:SS"] [-for "MMM/DD/YYYY

HH24:MI:SS"]

[-node all | local | n1,n2,...] [-verbose] [-o file]

F-11

Appendix F


Parameters

Table F-5 tfactl analyze Command Parameters

Parameter

-search "pattern"

-comp db | asm | crs | acfs | os | osw | oswslabinfo | all

-type error | warning

| generic

-last n[h|d]

-from | -to | -for

"MMM/DD/YYYY

HH24:MI:SS"

-node all | local |

n1,n2,...

–verbose

–o file

Description

Searches for a pattern enclosed in double quotation marks ("") in system and alert logs within a specified time range. This parameter supports both case-sensitive and case-insensitive search in alert and system message files across the cluster within the specified filters. Default is case insensitive.

If you do not specify the

-search

parameter, then Oracle Trace File

Analyzer provides a summary of messages within specified filters from alert and system log messages across the cluster.

Oracle Trace File Analyzer displays message counts grouped by type ( error

, warning

, and generic

) and shows unique messages in a table organized by message type selected for analysis. The generic

message type is assigned to all messages which are not either an error

or warning

message type.

Select which components you want Oracle Trace File Analyzer to analyze. Default is all

.

• db

: Database alert logs

• asm

: Oracle ASM alert logs

• crs

: Oracle Grid Infrastructure alert logs

• acfs

: Oracle ACFS alert logs

• os

: System message files

• osw

: OSW Top output

• oswlabinfo

: OSW Slabinfo output

When

OSWatcher

data is available,

OSW

and

OSWSLABINFO components provide summary views of OSWatcher data.

Select what type of messages Oracle Trace File Analyzer analyzes.

Default is error

.

Specify an amount of time, in hours or days, before current time that you want Oracle Trace File Analyzer to analyze.

Specify a time interval, using the

-from

and

-to

parameters together, or a specific time using the

-for

parameter, that you want

Oracle Trace File Analyzer to analyze.

Specify a comma-separated list of host names. Use

-local

to analyze files on the local node. Default is all.

Displays verbose output.

Specify a file where Oracle Trace File Analyzer writes the output instead of displaying on the screen.

-type Parameter Arguments

The tfactl analyze

command classifies all the messages into different categories when you specify the

-type

parameter. The analysis component provides count of messages by the message type you configure and lists all unique messages grouped by count within specified filters. The message type patterns for each argument are listed in the following table.

F-12

Appendix F


Table F-6 tfactl analyze -type Parameter Arguments

Argument

error warning

Description

Error message patterns for database and Oracle ASM alert logs:

.*ORA-00600:.*

.*ORA-07445:.*

.*IPC Send timeout detected. Sender: ospid.*

.*Direct NFS: channel id .* path .* to filer .* PING timeout.*

.*Direct NFS: channel id .* path .* to filer .* is DOWN.*

.*ospid: .* has not called a wait for .* secs.*

.*IPC Send timeout to .* inc .* for msg type .* from opid.*

.*IPC Send timeout: Terminating pid.*

.*Receiver: inst .* binc .* ospid.*

.* terminating instance due to error.*

.*: terminating the instance due to error.*

.*Global Enqueue Services Deadlock detected

Error message patterns for Oracle Grid Infrastructure alert logs:

.*CRS-8011:.*,.*CRS-8013:.*,.*CRS-1607:.*,.*CRS-1615:.*,

.*CRS-1714:.*,.*CRS-1656:.*,.*PRVF-5305:.*,.*CRS-1601:.*,

.*CRS-1610:.*,.*PANIC. CRSD exiting:.*,.*Fatal Error from

AGFW Proxy:.*

Warning message patterns for database and Oracle ASM alert logs:

NOTE: process .* initiating offline of disk .*

.*WARNING: cache read a corrupted block group.*

.*NOTE: a corrupted block from group FRA was dumped to

Any messages that do not match any of the preceding patterns.

generic

Examples

The following command examples demonstrate how to use Oracle Trace File Analyzer to search collected data:

•

$ tfactl analyze -search "error" -last 2d

Oracle Trace File Analyzer searches alert and system log files from the past two days for messages that contain the case-insensitive string "error".

•

$ tfactl analyze -comp os -for "Jul/01/2016 11" -search "."

Oracle Trace File Analyzer displays all system log messages for July 1, 2016 at 11 am.

•

$ tfactl analyze -search "/ORA-/c" -comp db -last 2d

Oracle Trace File Analyzer searches database alert logs for the case-sensitive string "ORA-" from the past two days.

The following command examples demonstrate how to use Oracle Trace File Analyzer to analyze collected data:

•

$ tfactl analyze -last 5h

F-13

Appendix F


Oracle Trace File Analyzer displays a summary of events collected from all alert logs and system messages from the past five hours.

•

$ tfactl analyze -comp os -last 1d

Oracle Trace File Analyzer displays a summary of events from system messages from the past day.

•

$ tfactl analyze -last 1h -type generic

Oracle Trace File Analyzer analyzes all generic messages from the last hour.

The following command examples demonstrate how to use Oracle Trace File Analyzer to analyze

OSWatcher

Top and Slabinfo:

•

$ tfactl analyze -comp osw -last 6h

Oracle Trace File Analyzer displays

OSWatcher

Top summary for the past six hours.

•

$ tfactl analyze -comp oswslabinfo -from "2016-07-01" -to "2016-07-03"

Oracle Trace File Analyzer displays

OSWatcher

Slabinfo summary for specified time period.

F.2.5 tfactl run

Use the tfactl run

command to run the requested action (can be inventory or scan or any support tool).

Syntax

tfactl run [inventory | scan | tool]

Parameters

Table F-7 tfactl run Command Parameters

Parameter

inventory scan tool

Description

Inventory of all trace file directories.

Runs a one off scan.

Runs the desired analysis tool.

Analysis Tools

Table F-8 tfactl run Analysis Tools Parameters

Parameter

changes events exachk grep history ls

Description

Prints system changes.

Lists all important events in system.

Runs

Oracle EXAchk

.

grep

for input string in logs.

Lists commands run in current Oracle Trace File Analyzer shell session.

Searches files in Oracle Trace File Analyzer.

F-14

Appendix F


Table F-8 (Cont.) tfactl run Analysis Tools Parameters

Parameter

orachk oratop oswbb param ps pstack prw sqlt summary tail vi

Description

Runs

Oracle ORAchk

.

Runs oratop

.

Runs

OSWatcher Analyzer

.

Prints parameter value.

Finds a process.

Runs pstack

on a process.

Runs

Procwatcher

.

Runs

SQLT

.

Prints system summary.

Tails log files.

Searches and opens files in the vi

editor.

Profiling Tools

Table F-9 tfactl run Profiling Tools Parameters

Parameter

dbglevel

Description

Sets CRS log and trace levels using profiles.

F.2.6 tfactl toolstatus

Use the tfactl toolstatus

command to view the status of Oracle Trace File Analyzer

Support Tools across all nodes.

Syntax

$ tfactl toolstatus

Example

The tfactl toolstatus

command returns output similar to the following, showing which tool is deployed and where the tool is deployed.

Table F-10 tfactl toolstatus Output

Host

hostname hostname hostname hostname hostname hostname

Tool

alertsummary exachk ls triage pstack orachk

Status

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

F-15

Appendix F


Table F-10 (Cont.) tfactl toolstatus Output

hostname hostname hostname hostname hostname hostname hostname hostname hostname

Host

hostname hostname hostname hostname hostname hostname hostname hostname hostname

Tool

sqlt grep summary vi prw tail param dbglevel managelogs history oratop calog menu oswbb changes events ps srdc

Status

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

NOT RUNNING

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

RUNNING

DEPLOYED

DEPLOYED

DEPLOYED

DEPLOYED

F.3 Running Diagnostic Collection Commands

Run the diagnostic collection commands to collect diagnostic data.

•

tfactl diagcollect



•

tfactl directory



•

tfactl ips

Use the tfactl ips


•

tfactl collection

Use the tfactl collection

command to stop a running Oracle Trace File Analyzer collection.

•

tfactl print



•

tfactl purge

Use the tfactl purge

command to delete diagnostic collections from the Oracle

Trace File Analyzer repository that are older than a specific time.

F-16

Appendix F


•

tfactl managelogs

Use the tfactl managelogs


F.3.1 tfactl diagcollect



Oracle Trace File Analyzer Collector can perform three types of on-demand collections:

• Default collections

• Event-driven Support Service Request Data Collection (SRDC) collections

• Custom collections

Prerequisites

Event-driven Support Service Request Data Collection (SRDC collections require components from the Oracle Trace File Analyzer Database Support Tools Bundle, which is available from My Oracle Support Note 1513912.2.

Syntax

tfactl diagcollect [-all | [component_name1] [component_name2] ...

[component_nameN]]

[-node all|local|n1,n2,..] [-tag description]

[-z filename]

[-last nh|d| -from time -to time | -for time]

[-nocopy] [-notrim] [-silent] [-nocores] [-collectalldirs]

[-collectdir dir1,dir2..] [-examples]

[-node [node1,node2,nodeN] components:-ips|-database|-asm|-crsclient|-dbclient|-dbwlm|-tns|-rhp|-procinfo|-afd

|-crs|-wls|-emagent|-oms|-ocm|-emplugins|-em|-acfs|-install|-cfgtools|-os|-ips

|-ashhtml|-ashtext|-awrhtml|-awrtext

Parameters

Each option must be prefixed with a minus sign (-).

Option

-all |

[component_name1]

[component_name2] ...

[component_nameN]

-node all|local|

n1,n2,...

-tag description

-z file_name

Description

Specify that you want to collect data on all components, or specify specific components for which you want to obtain collections.

Specify a comma-delimited list of nodes from which to collect diagnostic information. Default is all.

Use this parameter to create a subdirectory for the resulting collection in the Oracle Trace File Analyzer repository.

Use this parameter to specify an output file name.

F-17

Appendix F


Option

-last numberh | d | from "mmm/dd/yyyy

hh:mm:ss" -to "mmm/dd/

yyyy hh:mm:ss" | -for

"mmm/dd/yyyy hh:mm:ss"

Description

• Specify the

-last

parameter to collect files that have relevant data for the past specific number of hours (h) or days (d). By default, using the command with this parameter also trims files that are large and shows files only from the specified interval.

• Specify the

-from

and

-to

parameters (you must use these two parameters together) to collect files that have relevant data during a specific time interval, and trim data before this time where files are large.

• Specify the

-for

parameter to collect files that have relevant data for the time given. The files TFACTL collects will have timestamps in between which the time you specify after

-for

is included. No data trimming is done for this option.

Note:

If you specify both date and time, then you must enclose both the values in double quotation marks (""). If you specify only the date or the time, then you do not have to enclose the single value in quotation marks.

-nocopy

-notrim

-silent

-nocores

-collectalldirs

-collectdir

dir1,dir2,...dirn

-examples

Specify this parameter to stop the resultant trace file collection from being copied back to the initiating node. The file remains in the

Oracle Trace File Analyzer repository on the executing node.

Specify this parameter to stop trimming the files collected.

Specify this parameter to run diagnostic collection as a background process

Specify this parameter to stop collecting core files when it would normally have been collected.

Specify this parameter to collect all files from a directory that has

Collect All

flag marked true.

Specify a comma-delimited list of directories and collection includes all files from these directories irrespective of type and time constraints in addition to the components specified.

Specify this parameter to view diagcollect

usage examples.

Examples

• The following command trims and zips all files updated in the last four hours, including chmos

and osw

data, from across the cluster and collects it on the initiating node:

$ tfactl diagcollect –all

Collecting data for the last 12 hours for this component ...


Creating ips package in master node ...

Trying ADR basepath /scratch/app/orabase

Trying to use ADR homepath diag/crs/node1/crs ...

Submitting request to generate package for ADR homepath /scratch/app/orabase/ diag/crs/node1/crs

Trying ADR basepath /scratch/app/oracle

F-18

Appendix F


Trying to use ADR homepath diag/rdbms/prod/prod_1 ...

Submitting request to generate package for ADR homepath /scratch/app/oracle/diag/ rdbms/prod/prod_1

Trying to use ADR homepath diag/rdbms/prod/prod_2 ...

Submitting request to generate package for ADR homepath /scratch/app/oracle/diag/ rdbms/prod/prod_2

Trying to use ADR homepath diag/rdbms/webdb/webdb_2 ...

Submitting request to generate package for ADR homepath /scratch/app/oracle/diag/ rdbms/webdb/webdb_2

Trying to use ADR homepath diag/rdbms/webdb/webdb_1 ...

Submitting request to generate package for ADR homepath /scratch/app/oracle/diag/ rdbms/webdb/webdb_1

Master package completed for ADR homepath /scratch/app/oracle/diag/rdbms/prod/ prod_1

Master package completed for ADR homepath /scratch/app/oracle/diag/rdbms/prod/ prod_2

Master package completed for ADR homepath /scratch/app/oracle/diag/rdbms/webdb/ webdb_1

Master package completed for ADR homepath /scratch/app/oracle/diag/rdbms/webdb/ webdb_2

Master package completed for ADR homepath /scratch/app/orabase/diag/crs/node1/crs

Created package 2 based on time range 2016-09-29 12:11:00.000000 -07:00 to

2016-09-30 00:11:00.000000 -07:00, correlation level basic

Remote package completed for ADR homepath(s) /diag/crs/node2/crs,/diag/crs/node3/ crs

Collection Id : 20160930001113node1

Detailed Logging at : /scratch/app/orabase/tfa/repository/ collection_Fri_Sep_30_00_11_13_PDT_2016_node_all/ diagcollect_20160930001113_node1.log

2016/09/30 00:12:21 PDT : Collection Name : tfa_Fri_Sep_30_00_11_13_PDT_2016.zip

2016/09/30 00:12:21 PDT : Collecting diagnostics from hosts : [node1, node3, node2]




[09/29/2016 12:12:21 PDT, 09/30/2016 00:12:26 PDT]




2016/09/30 00:15:26 PDT : Remote Collection in Progress...

.------------------------------------.


+---------+-----------+-------+------+


+---------+-----------+-------+------+

| node3 | Completed | 82MB | 172s |



'---------+-----------+-------+------'

Logs are being collected to: /scratch/app/orabase/tfa/repository/ collection_Fri_Sep_30_00_11_13_PDT_2016_node_all

/scratch/app/orabase/tfa/repository/ collection_Fri_Sep_30_00_11_13_PDT_2016_node_all/ node3.tfa_Fri_Sep_30_00_11_13_PDT_2016.zip

/scratch/app/orabase/tfa/repository/ collection_Fri_Sep_30_00_11_13_PDT_2016_node_all/

F-19

Appendix F

Running Diagnostic Collection Commands node2.tfa_Fri_Sep_30_00_11_13_PDT_2016.zip

/scratch/app/orabase/tfa/repository/ collection_Fri_Sep_30_00_11_13_PDT_2016_node_all/ node1.tfa_Fri_Sep_30_00_11_13_PDT_2016.zip

• The following command trims and zips all files updated in the last eight hours, including chmos

and osw

data, from across the cluster and collects it on the initiating node:

$ tfactl diagcollect –all –last 8h

• The following command trims and zips all files from databases hrdb

and fdb updated in the last one day and collects it on the initiating node:

$ tfactl diagcollect -database hrdb,fdb -last 1d -z foo

• The following command trims and zips all Oracle Clusterware files, operating system logs, and chmos

and osw

data from node1

and node2

updated in the last six hours, and collects it on the initiating node:

$ tfactl diagcollect –crs -os -node node1,node2 -last 6h

• The following command trims and zips all Oracle ASM logs from node1

updated between September 22, 2016 and September 23, 2016 at 21:00, and collects it on the initiating node:

$ tfactl diagcollect -asm -node node1 -from Sep/22/2016 -to "Sep/23/2016

21:00:00"

• The following command trims and zips all log files updated on September 23, 2016 and collect at the initiating node:

$ tfactl diagcollect -for Sep/23/2016

• The following command trims and zips all log files updated from 09:00 on

September 22, 2016, to 09:00 on September 23, 2016, which is 12 hours before and after the time specified in the command, and collects it on the initiating node:

$ tfactl diagcollect -for "September/22/2016 21:00:00"

Related Topics:


F.3.2 tfactl directory



Also, use the tfactl directory

command to change the directory permissions. When automatic discovery adds a directory, the directory is added as public. Any user who has sufficient permissions to run the tfactl diagcollect

command collects any file in that directory. This is only important when non-root or sudo

users run TFACTL commands.

If a directory is marked as private, then Oracle Trace File Analyzer, before allowing any files to be collected:

• Determines which user is running TFACTL commands

• Verifies if the user has permissions to see the files in the directory

F-20

Appendix F


Note:

A user can only add a directory to Oracle Trace File Analyzer to which they have read access. If you have automatic diagnostic collections configured, then Oracle Trace File Analyzer runs as root

, and can collect all available files.

The tfactl directory

command includes three verbs with which you can manage directories: add

, remove

, and modify

.

Syntax

tfactl directory add directory [-public] [-exclusions | -noexclusions | -collectall]

[-node all | n1,n2...] tfactl directory remove directory [-node all | n1,n2...] tfactl directory modify directory [-private | -public] [-exclusions | -noexclusions

| -collectall]

For each of the three syntax models, you must specify a directory path where Oracle

Trace File Analyzer stores collections.

Parameters

Table F-11 tfactl directory Command Parameters

Parameter

-public

-private

-exclusions

-noexclusions

-collectall

-node all | n1,n2...

Description

Use the

-public

parameter to make the files contained in the directory available for collection by any Oracle Trace File Analyzer user.

Use the

-private

parameter to prevent an Oracle Trace File

Analyzer user who does not have permission to see the files in a directory (and any subdirectories) you are adding or modifying, from running a command to collect files from the specified directory.

Use the

-exclusions

parameter to specify that files in this directory are eligible for collection if the files satisfy type, name, and time range restrictions.

Use the

-noexclusions

parameter to specify that files in this directory are eligible for collection if the files satisfy time range restrictions.

Use the

-collectall

parameter to specify that files in this directory are eligible for collection irrespective of type and time range when the user specifies the

-collectalldirs

parameter with the tfactl diagcollect

command.

Add or remove directories from every node in the cluster or use a comma-delimited list to add or remove directories from specific nodes.

F-21

Appendix F


Usage Notes

You must add all trace directory names to the Berkeley database so that Oracle Trace

File Analyzer can collect file metadata in that directory. The discovery process finds most directories, but if new or undiscovered directories are required, then you can add these manually using the tfactl directory

command.

When you add a directory using TFACTL, then Oracle Trace File Analyzer attempts to determine whether the directory is for

• Oracle database

• Oracle Clusterware

• Operating system logs

• Some other component

• Which database or instance

If Oracle Trace File Analyzer cannot determine this information, then Oracle Trace File

Analyzer returns an error and requests that you enter the information, similar to the following:

# tfactl directory add /tmp

Failed to add directory to TFA. Unable to determine parameters for directory: /tmp

Please enter component for this Directory [RDBMS|CRS|ASM|INSTALL|OS|CFGTOOLS|TNS|

DBWLM|ACFS|ALL] : RDBMS

Please enter database name for this Directory :MYDB

Please enter instance name for this Directory :MYDB1

Note:

For OS, CRS, CFGTOOLS, ACFS, ALL, or INSTALL files, only the component is requested and for Oracle ASM only the instance is created. No verification is done for these entries so use caution when entering this data.

Examples

The following command adds a directory:

# tfactl directory add /u01/app/grid/diag/asm/+ASM1/trace

The following command modifies a directory and makes the contents available for collection only to Oracle Trace File Analyzer users with sufficient permissions:

# tfactl directory modify /u01/app/grid/diag/asm/+ASM1/trace -private

The following command removes a directory from all nodes in the cluster:

# tfactl directory remove /u01/app/grid/diag/asm/+ASM1/trace -node all

F.3.3 tfactl ips

Use the tfactl ips


F-22

Appendix F


Syntax

tfactl ips [ADD] [ADD FILE] [ADD NEW INCIDENTS] [CHECK REMOTE KEYS] [COPY IN FILE]

[COPY OUT FILE] [CREATE PACKAGE] [DELETE PACKAGE] [FINALIZE PACKAGE] [GENERATE

PACKAGE]

[GET MANIFEST] [GET METADATA] [GET REMOTE KEYS] [PACK] [REMOVE] [REMOVE FILE]

[SET CONFIGURATION] [SHOW CONFIGURATION] [SHOW FILES] [SHOW INCIDENTS] [SHOW

PROBLEMS]

[SHOW PACKAGE] [UNPACK FILE] [UNPACK PACKAGE] [USE REMOTE KEYS] [options]

Parameters

Table F-12 tfactl ips Command Parameters

Parameter

ADD

ADD FILE

ADD NEW INCIDENTS

CHECK REMOTE KEYS

COPY IN FILE

COPY OUT FILE

CREATE PACKAGE

DELETE PACKAGE

FINALIZE PACKAGE

GENERATE PACKAGE

GET MANIFEST

GET METADATA

GET REMOTE KEYS

PACK

REMOVE

REMOVE FILE

SET CONFIGURATION

SHOW CONFIGURATION

SHOW FILES

SHOW INCIDENTS

SHOW PROBLEMS

SHOW PACKAGE

UNPACK FILE

Description

Adds incidents to an existing package.

Adds a file to an existing package.

Finds new incidents for the problems and add the latest ones to the package.

Creates a file with keys matching incidents in specified package.

Copies an external file into Automatic Diagnostic Repository, and associates it with a package and (optionally) an incident.

Copies an Automatic Diagnostic Repository file to a location outside Automatic Diagnostic Repository.

Creates a package, and optionally select contents for the package.

Drops a package and its contents from Automatic Diagnostic

Repository.

Gets a package ready for shipping by automatically including correlated contents.

Creates a physical package ( zip

file) in target directory.

Extracts the manifest from a package file and displays it.

Extracts the metadata XML document from a package file and displays it.

Creates a file with keys matching incidents in specified package.

Creates a package, and immediately generates the physical package.

Removes incidents from an existing package.

Removes a file from an existing package.

Changes the value of an Incident Packaging Service configuration parameter.

Shows the current Incident Packaging Service settings.

Shows the files included in the specified package.

Shows incidents included in the specified package.

Shows problems for the current Automatic Diagnostic Repository home.

Shows details for the specified package.

Unpackages a physical file into the specified path.

F-23

Appendix F


Table F-12 (Cont.) tfactl ips Command Parameters

Parameter

UNPACK PACKAGE

USE REMOTE KEYS

Description

Unpackages physical files in the current directory into the specified path, if they match the package name.

Adds incidents matching the keys in the specified file to the specified package.

•

tfactl ips ADD

Use the tfactl ips ADD

command to add incidents to an existing package.

•

tfactl ips ADD FILE

Use the tfactl ADD FILE

command to add a file to an existing package.

•

tfactl ips COPY IN FILE

Use the tfactl ips COPY IN FILE

command to copy an external file into Automatic

Diagnostic Repository, and associate the file with a package and (optionally) an incident.

•

tfactl ips REMOVE

Use the tfactl ips REMOVE

command to remove incidents from an existing package.

•

tfactl ips REMOVE FILE

Use the tfactl ips REMOVE FILE

command to remove a file from an existing package.

•

tfactl ips ADD NEW INCIDENTS PACKAGE

Use the tfactl ips ADD NEW INCIDENTS PACKAGE

command to find new incidents for the problems in a specific package, and add the latest ones to the package.

•

tfactl ips GET REMOTE KEYS FILE

Use the tfactl ips GET REMOTE KEYS FILE

command to create a file with keys matching incidents in a specific package.

•

tfactl ips USE REMOTE KEYS FILE

Use the tfactl ips USE REMOTE KEYS FILE

command to add incidents matching the keys in a specific file to a specific package.

•

tfactl ips CREATE PACKAGE

Use the tfactl ips CREATE PACKAGE

command to create a package, and optionally select the contents for the package.

•

tfactl ips FINALIZE PACKAGE

Use the tfactl ips FINALIZE PACKAGE

command to get a package ready for shipping by automatically including correlated contents.

•

tfactl ips GENERATE PACKAGE

Use the tfactl ips GENERATE PACKAGE

command to create a physical package ( zip file) in the target directory.

•

tfactl ips DELETE PACKAGE

Use the tfactl ips DELETE PACKAGE

command to drop a package and its contents from the Automatic Diagnostic Repository.

•

tfactl ips GET MANIFEST FROM FILE

Use the tfactl ips GET MANIFEST FROM FILE

command to extract the manifest from a package file and view it.

F-24

Appendix F


•

tfactl ips GET METADATA

Use the tfactl ips GET METADATA

command to extract the metadata XML document from a package file and view it.

•

tfactl ips PACK

Use the tfactl ips PACK

command to create a package and immediately generate the physical package.

•

tfactl ips SET CONFIGURATION

Use the tfactl ips SET CONFIGURATION

command to change the value of an

Incident Packaging Service configuration parameter.

•

tfactl ips SHOW CONFIGURATION

Use the tfactl ips SHOW CONFIGURATION

command to view the current Incident

Packaging Service settings.

•

tfactl ips SHOW PACKAGE

Use the tfactl ips SHOW PACKAGE

command to view the details of a specific package.

•

tfactl ips SHOW FILES PACKAGE

Use the tfactl ips SHOW FILES PACKAGE

command to view the files included in a specific package.

•

tfactl ips SHOW INCIDENTS PACKAGE

Use the tfactl ips SHOW INCIDENTS PACKAGE

command to view the incidents included in a specific package.

•


Use the tfactl ips SHOW PROBLEMS

command to view the problems for the current

Automatic Diagnostic Repository home.

•

tfactl ips UNPACK FILE

Use the tfactl ips UNPACK FILE

command to unpack a physical file into a specific path.

•

tfactl ips UNPACK PACKAGE

Use the tfactl ips UNPACK PACKAGE

command to unpack physical files in the current directory into a specific path, if they match the package name.

F.3.3.1 tfactl ips ADD

Use the tfactl ips ADD

command to add incidents to an existing package.

Syntax

tfactl ips ADD [INCIDENT incid | PROBLEM prob_id | PROBLEMKEY prob_key | SECONDS

seconds | TIME start_time TO end_time] PACKAGE package_id

Parameters

Table F-13 tfactl ips ADD Command Parameters

Parameter

incid prob_id prob_key

Description

Specify the ID of the incident to add to the package contents.

Specify the ID of the problem to add to the package contents.

Specify the problem key to add to the package contents.

F-25

Appendix F


Table F-13 (Cont.) tfactl ips ADD Command Parameters

Parameter

seconds start_time end_time

Description

Specify the number of seconds before now for adding package contents.

Specify the start of time range to look for incidents in.

Specify the end of time range to look for incidents in.

F.3.3.2 tfactl ips ADD FILE

Use the tfactl ADD FILE

command to add a file to an existing package.

Syntax

The file must be in the same

ADR_BASE

as the package.

tfactl ips ADD FILE file_spec PACKAGE pkgid

Parameters

Table F-14 tfactl ips ADD FILE Command Parameters

Parameter

file_spec package_id

Description

Specify the file with file and path (full or relative).

Specify the ID of the package to add the file to.

F.3.3.3 tfactl ips COPY IN FILE

Use the tfactl ips COPY IN FILE

command to copy an external file into Automatic

Diagnostic Repository, and associate the file with a package and (optionally) an incident.

Syntax

tfactl ips COPY IN FILE file [TO new_name] [OVERWRITE] PACKAGE pkgid [INCIDENT incid]

Parameters

Table F-15 tfactl ips COPY IN FILE Command Parameters

Parameter

file new_name pkgid incid

Description

Specify the file with file name and full path (full or relative).

Specify a name for the copy of the file.

Specify the ID of the package to associate the file with.

Specify the ID of the incident to associate the file with.

Options

OVERWRITE

: If the file exists, then use the

OVERWRITE

option to overwrite the file.

F-26

Appendix F


F.3.3.4 tfactl ips REMOVE

Use the tfactl ips REMOVE

command to remove incidents from an existing package.

Syntax

The incidents remain associated with the package, but not included in the physical package file.

tfactl ips REMOVE [INCIDENT incid | PROBLEM prob_id | PROBLEMKEY prob_key] PACKAGE

package_id

Parameters

Table F-16 tfactl ips REMOVE Command Parameters

Parameter

incid prob_id prob_key

Description

Specify the ID of the incident to add to the package contents.

Specify the ID of the problem to add to the package contents.

Specify the problem key to add to the package contents.

Example

$ tfactl ips remove incident 22 package 12

F.3.3.5 tfactl ips REMOVE FILE

Use the tfactl ips REMOVE FILE

command to remove a file from an existing package.

Syntax

The file must be in the same

ADR_BASE

as the package. The file remains associated with the package, but not included in the physical package file.

tfactl ips REMOVE FILE file_spec PACKAGE pkgid

Example

$ tfactl ips remove file ADR_HOME/trace/mydb1_ora_13579.trc package 12

F.3.3.6 tfactl ips ADD NEW INCIDENTS PACKAGE

Use the tfactl ips ADD NEW INCIDENTS PACKAGE

command to find new incidents for the problems in a specific package, and add the latest ones to the package.

Syntax

tfactl ips ADD NEW INCIDENTS PACKAGE package_id

F-27

Appendix F


Parameters

Table F-17 tfactl ips ADD NEW INCIDENTS PACKAGE Command Parameters

Parameter

package_id

Description

Specify the ID of the package to add the incidents to.

F.3.3.7 tfactl ips GET REMOTE KEYS FILE

Use the tfactl ips GET REMOTE KEYS FILE

command to create a file with keys matching incidents in a specific package.

Syntax

tfactl ips GET REMOTE KEYS FILE file_spec PACKAGE package_id

Parameters

Table F-18 tfactl ips GET REMOTE KEYS FILE Command Parameters

Parameter

file_spec package_id

Description

Specify the file with file name and full path (full or relative).

Specify the ID of the package to get keys for.

Example

$ tfactl ips get remote keys file /tmp/key_file.txt package 12

F.3.3.8 tfactl ips USE REMOTE KEYS FILE

Use the tfactl ips USE REMOTE KEYS FILE

command to add incidents matching the keys in a specific file to a specific package.

Syntax

tfactl ips USE REMOTE KEYS FILE file_spec PACKAGE package_id

Example

$ tfactl ips use remote keys file /tmp/key_file.txt package 12

F.3.3.9 tfactl ips CREATE PACKAGE

Use the tfactl ips CREATE PACKAGE

command to create a package, and optionally select the contents for the package.

Syntax

tfactl ips CREATE PACKAGE [INCIDENT inc_id | PROBLEM prob_id

| PROBLEMKEY prob_key | SECONDS seconds | TIME start_time TO end_time] [CORRELATE

BASIC | TYPICAL | ALL] [MANIFEST file_spec]

[KEYFILE file_spec]

F-28

Appendix F


Parameters

Table F-19 tfactl ips CREATE PACKAGE Command Parameters

Parameter

incid prob_id prob_key seconds start_time end_time

Description

Specify the ID of the incident to use for selecting the package contents.

Specify the ID of the problem to use for selecting the package contents.

Specify the problem key to use for selecting the package contents.

Specify the number of seconds before now for selecting the package contents.

Specify the start of time range to look for the incidents in.

Specify the end of time range to look for the incidents in.

Options

•

CORRELATE BASIC

: The package includes the incident dumps and the incident process trace files. If the incidents share relevant correlation keys, then more incidents are included automatically.

•

CORRELATE TYPICAL

: The package includes the incident dumps and all trace files that were modified in a time window around each incident. If the incidents share relevant correlation keys, or occurred in a time window around the main incidents, then more incidents are included automatically.

•

CORRELATE ALL

: The package includes the incident dumps and all trace files that were modified between the first selected incident and the last selected incident. If the incidents occurred in the same time range, then more incidents are included automatically.

•

MANIFEST file_spec

: Generates the XML format package manifest file.

•

KEYFILE file_spec

: Generates the remote key file.

Note:

• If you do not specify package contents, such as incident, problem, and so on, then Oracle Trace File Analyzer creates an empty package.

You can add files and incidents later.

• If you do not specify the correlation level, then Oracle Trace File Analyzer uses the default level.

• The default is normally TYPICAL, but you can change using the

IPS SET

CONFIGURATION

command.

Example

$tfactl ips create package incident 861

F-29

Appendix F


$ tfactl ips create package time '2006-12-31 23:59:59.00 -07:00' to '2007-01-01

01:01:01.00 -07:00'

F.3.3.10 tfactl ips FINALIZE PACKAGE

Use the tfactl ips FINALIZE PACKAGE

command to get a package ready for shipping by automatically including correlated contents.

Syntax

tfactl ips FINALIZE PACKAGE package_id

F.3.3.11 tfactl ips GENERATE PACKAGE

Use the tfactl ips GENERATE PACKAGE

command to create a physical package ( zip

file) in the target directory.

Syntax

tfactl ips GENERATE PACKAGE package_id [IN path][COMPLETE | INCREMENTAL]

Parameters

Table F-20 tfactl ips GENERATE PACKAGE Command Parameters

Parameter

package_id path

Description

Specify the ID of the package to create physical package file for.

Specify the path where the physical package file must be generated.

Options

•

COMPLETE

: (Default) The package includes all package files even if a previous package sequence was generated.

•

INCREMENTAL

: The package includes only the files that have been added or changed since the last package was generated.

Note:

If no target path is specified, then Oracle Trace File Analyzer generates the physical package file in the current working directory.

Example

$ tfactl ips generate package 12 in /tmp

F.3.3.12 tfactl ips DELETE PACKAGE

Use the tfactl ips DELETE PACKAGE

command to drop a package and its contents from the Automatic Diagnostic Repository.

F-30

Appendix F


Syntax

tfactl ips DELETE PACKAGE package_id

Parameters

Table F-21 tfactl ips DELETE PACKAGE Command Parameters

Parameter

package_id

Description

Specify the ID of the package to delete.

Example

$ tfactl ips delete package 12

F.3.3.13 tfactl ips GET MANIFEST FROM FILE

Use the tfactl ips GET MANIFEST FROM FILE

command to extract the manifest from a package file and view it.

Syntax

tfactl ips GET MANIFEST FROM FILE file

Parameters

Table F-22 tfactl ips GET MANIFEST FROM FILE Command Parameters

Parameter

file

Description

Specify the external file with file name and full path.

Example

$ tfactl ips GET MANIFEST FROM FILE

/tmp/IPSPKG_200704130121_COM_1.zip

F.3.3.14 tfactl ips GET METADATA

Use the tfactl ips GET METADATA

command to extract the metadata XML document from a package file and view it.

Syntax

tfactl ips GET METADATA [FROM FILE file | FROM ADR]

Example

$ tfactl ips get metadata from file /tmp/IPSPKG_200704130121_COM_1.zip

F.3.3.15 tfactl ips PACK

Use the tfactl ips PACK

command to create a package and immediately generate the physical package.

F-31

Appendix F


Syntax

tfactl ips PACK [INCIDENT incid | PROBLEM prob_id | PROBLEMKEY prob_key | SECONDS

seconds | TIME start_time TO end_time]

[CORRELATE BASIC | TYPICAL | ALL] [MANIFEST file_spec] [KEYFILE file_spec]

Parameters

Table F-23 tfactl ips PACK Command Parameters

Parameter

incid prob_id prob_key seconds start_time end_time path

Description

Specify the ID of the incident to use for selecting the package contents.

Specify the ID of the problem to use for selecting the package contents.

Specify the problem key to use for selecting the package contents.

Specify the number of seconds before the current time for selecting the package contents.

Specify the start of time range to look for the incidents in.

Specify the end of time range to look for the incidents in.

Specify the path where the physical package file must be generated.

Options

•

CORRELATE BASIC

: The package includes the incident dumps and the incident process trace files. If the incidents share relevant correlation keys, then more incidents are included automatically.

•

CORRELATE TYPICAL

: The package includes the incident dumps and all trace files that were modified in a time window around each incident. If the incidents share relevant correlation keys, or occurred in a time window around the main incidents, then more incidents are included automatically.

•

CORRELATE ALL

: The package includes the incident dumps and all trace files that were modified between the first selected incident and the last selected incident. If the incidents occurred in the same time range, then more incidents are included automatically.

•

MANIFEST file_spec

: Generate the XML format package manifest file.

•

KEYFILE file_spec

: Generate remote key file.

F-32

Appendix F


Note:

If you do not specify package contents, such as incident, problem, and so on, then Oracle Trace File Analyzer creates an empty package.

You can add files and incidents later.

If you do not specify the correlation level, then Oracle Trace File Analyzer uses the default level.

The default is normally TYPICAL, but you can change using the

IPS SET

CONFIGURATION

command.

Example

$ tfactl ips pack incident 861

$ tfactl ips pack time '2006-12-31 23:59:59.00 -07:00' to '2007-01-01 01:01:01.00

-07:00'

F.3.3.16 tfactl ips SET CONFIGURATION

Use the tfactl ips SET CONFIGURATION

command to change the value of an Incident

Packaging Service configuration parameter.

Syntax

tfactl ips SET CONFIGURATION parameter_id value

Parameters

Table F-24 tfactl ips SET CONFIGURATION Command Parameters

Parameter

parameter_id value

Description

Specify the ID of the parameter to change.

Specify the new value for the parameter.

Example

$ tfactl ips set configuration 6 2

F.3.3.17 tfactl ips SHOW CONFIGURATION

Use the tfactl ips SHOW CONFIGURATION

command to view the current Incident

Packaging Service settings.

Syntax

tfactl ips SHOW CONFIGURATION parameter_id

F.3.3.18 tfactl ips SHOW PACKAGE

Use the tfactl ips SHOW PACKAGE

command to view the details of a specific package.

F-33

Appendix F


Syntax

tfactl ips SHOW PACKAGE package_id [BASIC | BRIEF | DETAIL]

Note:

It is possible to specify the level of detail to use with this command.

BASIC

: Shows a minimal amount of information. It is the default when no package ID is specified.

BRIEF

: Shows a more extensive amount of information. It is the default when a package ID is specified.

DETAIL

: Shows the same information as

BRIEF

, and also some package history and information on included incidents and files.

Example

$ tfactl ips show package

$ tfactl ips show package 12 detail

F.3.3.19 tfactl ips SHOW FILES PACKAGE

Use the tfactl ips SHOW FILES PACKAGE

command to view the files included in a specific package.

Syntax

tfactl ips SHOW FILES PACKAGE package_id

Example

$ tfactl ips show files package 12

F.3.3.20 tfactl ips SHOW INCIDENTS PACKAGE

Use the tfactl ips SHOW INCIDENTS PACKAGE

command to view the incidents included in a specific package.

Syntax

tfactl ips SHOW INCIDENTS PACKAGE package_id

Example

$ tfactl ips show incidents package 12

F.3.3.21 tfactl ips SHOW PROBLEMS

Use the tfactl ips SHOW PROBLEMS

command to view the problems for the current

Automatic Diagnostic Repository home.

F-34

Appendix F


Syntax


F.3.3.22 tfactl ips UNPACK FILE

Use the tfactl ips UNPACK FILE

command to unpack a physical file into a specific path.

Syntax

Running the following command automatically creates a valid

ADR_HOME

structure. The path must exist and be writable.

tfactl ips UNPACK FILE file_spec [INTO path]

Example

$ tfactl ips unpack file /tmp/IPSPKG_20061026010203_COM_1.zip into /tmp/newadr

F.3.3.23 tfactl ips UNPACK PACKAGE

Use the tfactl ips UNPACK PACKAGE

command to unpack physical files in the current directory into a specific path, if they match the package name.

Syntax

Running the following command automatically creates a valid

ADR_HOME

structure. The path must exist and be writable.

tfactl ips UNPACK PACKAGE pkg_name [INTO path]

Example

$ tfactl ips unpack package IPSPKG_20061026010203 into /tmp/newadr

F.3.4 tfactl collection

Use the tfactl collection

command to stop a running Oracle Trace File Analyzer collection.

Syntax

tfactl collection [stop collection_id]

You can only stop a collection using the tfactl collection

command. You must provide a collection ID, which you can obtain by running the tfactl print

command.

F.3.5 tfactl print



Syntax

tfactl print [status | config | directories | hosts | actions | repository | cookie]

F-35

Appendix F


Parameters

Table F-25 tfactl print Command Parameters

Parameter

status config directories hosts actions repository cookie

Description

Displays the status of Oracle Trace File Analyzer across all nodes in the cluster. Also, displays the Oracle Trace File Analyzer version and the port on which it is running.

Displays the current Oracle Trace File Analyzer configuration settings.

Lists all the directories that Oracle Trace File Analyzer scans for trace or log file data. Also, displays the location of the trace directories allocated for the database, Oracle ASM, and instance.

Lists the hosts that are part of the Oracle Trace File Analyzer cluster, and that can receive cluster-wide commands.

Lists all the actions submitted to Oracle Trace File Analyzer, such as diagnostic collection. By default, tfactl print

commands only display actions that are running or that have completed in the last hour.

Displays the current location and amount of used space of the repository directory. Initially, the maximum size of the repository directory is the smaller of either 10 GB or 50% of available file system space. If the maximum size is exceeded or the file system space gets to 1 GB or less, then Oracle Trace File Analyzer suspends operations and closes the repository. Use the tfactl purge

command to clear collections from the repository.

Generates and displays an identification code for use by the tfactl set

command.

Example

The tfactl print config

command returns output similar to the following:

$ tfactl print config

.------------------------------------------------------------------------------------

.

| node1

|

+-----------------------------------------------------------------------+------------

+

| Configuration Parameter | Value

|

+-----------------------------------------------------------------------+------------

+

| TFA Version | 12.2.1.0.0

|

| Java Version | 1.8

|

| Public IP Network | true

|

| Automatic Diagnostic Collection | true

|

| Alert Log Scan | true

|

F-36

Appendix F


| Disk Usage Monitor | true

|

| Managelogs Auto Purge | false

|

| Trimming of files during diagcollection | true

|

| Inventory Trace level | 1

|

| Collection Trace level | 1

|

| Scan Trace level | 1

|

| Other Trace level | 1

|

| Repository current size (MB) | 5

|

| Repository maximum size (MB) | 10240

|

| Max Size of TFA Log (MB) | 50

|

| Max Number of TFA Logs | 10

|

| Max Size of Core File (MB) | 20

|

| Max Collection Size of Core Files (MB) | 200

|

| Minimum Free Space to enable Alert Log Scan (MB) | 500

|

| Time interval between consecutive Disk Usage Snapshot(minutes) | 60

|

| Time interval between consecutive Managelogs Auto Purge(minutes) | 60

|

| Logs older than the time period will be auto purged(days[d]|hours[h]) | 30d

|

| Automatic Purging | true

|

| Age of Purging Collections (Hours) | 12

|

| TFA IPS Pool Size | 5

|

'-----------------------------------------------------------------------

+------------'

In the preceding sample output:

•

Automatic diagnostic collection

: When

ON

(default is

OFF

), if scanning an alert log, then finding specific events in those logs triggers diagnostic collection.

•

Trimming of files during diagcollection

: Determines if Oracle Trace File

Analyzer trims large files to contain only data that is within the specified time ranges. When trimming is

OFF

, no trimming of trace files occurs for automatic diagnostic collection.

•

Repository current size in MB

: How much space in the repository is used.

•

Repository maximum size in MB

: The maximum size of storage space in the repository. Initially, the maximum size is set to the smaller of either 10 GB or 50% of free space in the file system.

•

Trace Level

: 1 is the default, and the values 2, 3, and 4 have increasing verbosity.

While you can set the trace level dynamically for running the Oracle Trace File

Analyzer daemon, increasing the trace level significantly impacts the performance

F-37

Appendix F

Running Diagnostic Collection Commands of Oracle Trace File Analyzer. Increase the trace level only at the request of My

Oracle Support.

•

Automatic Purging

: Automatic purging of Oracle Trace File Analyzer collections is enabled by default. Oracle Trace File Analyzer collections are purged if their age exceeds the value of

Minimum Age of Collections to Purge

, and the repository space is exhausted.

•

Minimum Age of Collections to Purge (Hours)

: The minimum number of hours that

Oracle Trace File Analyzer keeps a collection, after which Oracle Trace File

Analyzer purges the collection. You can set the number of hours using the tfactl set minagetopurge=hours

command.

•

Minimum Space free to enable Alert Log Scan (MB)

: The space limit, in MB, at which Oracle Trace File Analyzer temporarily suspends alert log scanning until space becomes free. Oracle Trace File Analyzer does not store alert log events if space on the file system used for the metadata database falls below the limit.

F.3.6 tfactl purge

Use the tfactl purge

command to delete diagnostic collections from the Oracle Trace

File Analyzer repository that are older than a specific time.

Syntax

tfactl purge -older number[h | d]

Example

The following command removes files older than 30 days:

$ tfactl purge -older 30d

F.3.7 tfactl managelogs

Use the tfactl managelogs


Syntax

tfactl managelogs [-purge [[-older nm|h|d] | [-gi] | [-database all|d1,d2,...]]]

[-show [usage|variation] [[-older nd] | [-gi] | [-database all|d1,d2,...]]]

Parameters

Table F-26 tfactl managelogs Purge Options

Purge Option

-older

-gi

-database

-dryrun

Description

Time period for purging logs.

Purges Oracle Grid Infrastructure logs (all Automatic Diagnostic

Repository homes under

GIBASE/diag

and crsdata (cvu dirs)

).

Purges Oracle database logs (Default is all, else provide a list).

Estimates logs cleared by purge

command.

F-38

Appendix F


Table F-27 tfactl managelogs Show Options

Show Option

-older

-gi

-database

Description

Time period for change in log volume.

Space utilization under

GIBASE

.

Space utilization for Oracle database logs (Default is all, else provide a list).

Example

$ tfactl managelogs -show usage -gi

Output from host : node3

------------------------------

.------------------------------------------------------------------------------------

----.

| Grid Infrastructure

Usage |

+----------------------------------------------------------------------------

+-----------+

| Location |

Size |

+----------------------------------------------------------------------------

+-----------+

| /scratch/app/orabase/diag/clients/user_grid/host_1389480572_107/alert | 8.00

KB |

| /scratch/app/orabase/diag/clients/user_grid/host_1389480572_107/incident | 4.00

KB |

| /scratch/app/orabase/diag/clients/user_grid/host_1389480572_107/trace | 1.55

MB |

| /scratch/app/orabase/diag/clients/user_grid/host_1389480572_107/cdump | 4.00

KB |

| /scratch/app/orabase/diag/clients/user_oracle/host_1389480572_107/alert | 8.00

KB |

| /scratch/app/orabase/diag/clients/user_oracle/host_1389480572_107/incident | 4.00

KB |

| /scratch/app/orabase/diag/clients/user_oracle/host_1389480572_107/trace |

712.00 KB |

| /scratch/app/orabase/diag/clients/user_oracle/host_1389480572_107/cdump | 4.00

KB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener/alert |

921.39 MB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener/incident | 4.00

KB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener/trace |

519.20 MB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener/cdump | 4.00

KB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener_scan2/alert |

726.55 MB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener_scan2/incident | 4.00

KB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener_scan2/trace |

339.90 MB |

| /scratch/app/orabase/diag/tnslsnr/node3/listener_scan2/cdump | 4.00

KB |

F-39

Appendix F


| /scratch/app/orabase/diag/diagtool/user_grid/adrci_1389480572_107/alert | 8.00

KB |

| /scratch/app/orabase/diag/diagtool/user_grid/adrci_1389480572_107/incident | 4.00

KB |

| /scratch/app/orabase/diag/diagtool/user_grid/adrci_1389480572_107/trace | 12.00

KB |

| /scratch/app/orabase/diag/diagtool/user_grid/adrci_1389480572_107/cdump | 4.00

KB |

| /scratch/app/orabase/diag/diagtool/user_grid/adrci_1389480572_107/hm | 4.00

KB |

| /scratch/app/orabase/diag/crs/node3/crs/alert | 44.00

KB |

| /scratch/app/orabase/diag/crs/node3/crs/incident | 4.00

KB |

| /scratch/app/orabase/diag/crs/node3/crs/trace | 1.67

GB |

| /scratch/app/orabase/diag/crs/node3/crs/cdump | 4.00

KB |

| /scratch/app/orabase/diag/asmtool/user_grid/host_1389480572_107/alert | 8.00

KB |

| /scratch/app/orabase/diag/asmtool/user_grid/host_1389480572_107/incident | 4.00

KB |

| /scratch/app/orabase/diag/asmtool/user_grid/host_1389480572_107/trace | 8.00

KB |

| /scratch/app/orabase/diag/asmtool/user_grid/host_1389480572_107/cdump | 4.00

KB |

| /scratch/app/orabase/diag/asmtool/user_root/host_1389480572_107/alert | 20.00

KB |

| /scratch/app/orabase/diag/asmtool/user_root/host_1389480572_107/incident | 4.00

KB |

| /scratch/app/orabase/diag/asmtool/user_root/host_1389480572_107/trace | 8.00

KB |

| /scratch/app/orabase/diag/asmtool/user_root/host_1389480572_107/cdump | 4.00

KB |

+----------------------------------------------------------------------------

+-----------+

| Total | 4.12

GB |

'----------------------------------------------------------------------------

+-----------'

$ tfactl managelogs -show variation -older 2h -gi

Output from host : node1

------------------------------

2016-09-30 00:49:57: INFO Checking space variation for 2 hours

.------------------------------------------------------------------------------------

----------------.

| Grid Infrastructure

Variation |

+----------------------------------------------------------------------------

+-----------+-----------+

| Directory | Old

Size | New Size |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener_scan2/trace | 12.00

KB | 12.00 KB |

+----------------------------------------------------------------------------

F-40

Appendix F


+-----------+-----------+


KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_root/host_1342558790_107/cdump | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+


KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/crs/node1/crs/alert |

328.00 KB | 404.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_grid/host_1342558790_107/incident | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener_scan2/alert | 16.00

KB | 16.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener/cdump | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_root/host_1342558790_107/trace | 8.00

KB | 8.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/crs/node1/crs/incident | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+


KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_root/host_1342558790_107/incident | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener_scan1/alert | 12.00

KB | 12.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_grid/host_1342558790_107/trace | 1.95

MB | 2.42 MB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener_scan3/alert |

562.34 MB | 726.93 MB |

+----------------------------------------------------------------------------

+-----------+-----------+


KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

F-41

Appendix F


| /scratch/app/orabase/diag/tnslsnr/node1/listener/incident | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/crs/node1/crs/cdump | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener/trace |

307.22 MB | 394.32 MB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_grid/host_1342558790_107/trace | 12.00

KB | 12.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_grid/host_1342558790_107/cdump | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_grid/host_1342558790_107/incident | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_oracle/host_1342558790_107/cdump | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_grid/host_1342558790_107/cdump | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_oracle/host_1342558790_107/incident | 4.00

KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener_scan1/trace | 8.00

KB | 8.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+


KB | 4.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener_scan3/trace |

263.64 MB | 340.29 MB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/tnslsnr/node1/listener/alert |

586.36 MB | 752.10 MB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_oracle/host_1342558790_107/trace | 1.17

MB | 1.17 MB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_grid/host_1342558790_107/alert | 16.00

KB | 16.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/clients/user_oracle/host_1342558790_107/alert | 8.00

F-42

Appendix F


KB | 8.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/crs/node1/crs/trace | 1.63

GB | 1.84 GB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_grid/host_1342558790_107/alert | 12.00

KB | 12.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+

| /scratch/app/orabase/diag/asmtool/user_root/host_1342558790_107/alert | 12.00

KB | 20.00 KB |

+----------------------------------------------------------------------------

+-----------+-----------+


KB | 4.00 KB |

'----------------------------------------------------------------------------

+-----------+-----------'

F-43

Index

A

analysis,

4-21

automated risk identification,

1-4

Automatic Diagnostic Repository log file,

4-33

trace file,

4-33

Automatic Diagnostic Repository (ADR),

7-1

,

7-4

automatic purging,

4-20

automatic restart,

2-2

AUTORUN_FLAGS

exclude profile,

2-20

profile,

2-20

tag,

2-20

AUTORUN_SCHEDULE,

A-7

availability issues

database,

1-1

server,

1-1

C

CA-signed certificate,

4-17

capture debug output,

2-55

CHACTL,

5-3

,

5-9

commands

chactl calibrate,

E-5

chactl config,

E-5

chactl export,

E-13

chactl import,

E-14

chactl monitor,

E-2

chactl query calibration,

E-10

chactl query diagnosis,

E-7

chactl query model,

E-9

chactl query repository,

E-10

chactl remove,

E-13

chactl rename,

E-13

chactl resize repository,

E-15

chactl set maxretention,

E-14

chactl status,

E-3

chactl unmonitor,

E-3

chmod,

2-58

Cluster Health Monitor,

3-1

collecting Cluster Health Monitor data,

3-2

data historical,

3-3

Cluster Health Monitor (continued) data (continued)

real time,

3-3

node eviction,

1-5

OCLUMON,

B-1

pinned process,

1-5

Cluster Health Monitor services cluster logger,

3-1

system monitor,

3-1

Cluster Ready Services (CRS),

4-7

cluster resource activity log,

8-3

,

D-1

cluster resource failures monitoring,

8-3

cluster_database,

7-1

CPUS

node view,

B-2

create incident tickets,

2-46

CRSCTL commands get calog maxsize,

D-8

get calog retentiontime,

D-8

query calog,

D-1

set calog maxsize,

D-9

set calog retentiontime,

D-9

csv format file output,

B-2

custom application integration,

2-50

D

daemon force stop,

2-17

info,

2-27

initcheck,

2-17

initpresetup,

2-17

initrmsetup,

2-17

initsetup,

2-17

nextautorun,

2-27

passwordless SSH,

2-17

start,

2-17

status,

2-27

stop,

2-17

daemon mode,

4-8

daemon mode operation,

2-2

data redaction,

4-10

Index-1

database alert logs,

7-4

DEVICES node view,

B-2

diagcollection.pl,

C-1

diagnostics collection script,

C-1

diff report,

2-16

E

edit incident tickets,

2-47

Elasticsearch,

2-49

email notification,

2-3

Expect utility,

2-12

F

file attribute check,

2-29

FILESYSTEMS node view,

B-2

G

Grid Infrastructure Management Repository

(GIMR),

3-1

H

Hang Manager,

1-8

,

7-1

health check score and summary,

2-3

Health Prognostics Engine,

5-2

HTML output,

5-4

I

integration,

2-49

Interconnects page monitoring Oracle Clusterware with Oracle

Enterprise Manager,

8-1

J

Java keytool,

4-17

JSON output results,

2-49

K

Kibana,

2-49

KPISET parameters,

5-6

L

lockcells,

2-48

Index

M

manage diagnostic collections,

4-19

manage directories,

4-19

manual purging,

4-21

Maximum Availability Architecture (MAA)

Scorecard,

2-3

Memory Guard,

1-7

,

6-1

log file,

6-2

N

NICS

node view,

B-2

node view defined,

B-2

node views

CPUS,

B-2

DEVICES,

B-2

FILESYSTEMS,

B-2

NICS,

B-2

PROCESSES,

B-2

PROTOCOL ERRORS,

B-2

SYSTEM,

B-2

TOP CONSUMERS,

B-2

non-daemon mode,

4-8

NOTIFICATION_EMAIL,

2-3

,

A-7

O

OCHAD daemon,

5-2

OCLUMON commands

debug,

B-1

dumpnodeview,

B-2

manage,

B-12

version,

B-14

on-demand mode,

2-12

openssl,

4-17

Oracle Cluster Health Advisor,

1-6

,

5-1

Oracle Cluster Health Advisor daemon,

5-2

Oracle Cluster Health Advisor model,

5-8

Oracle Clusterware,

4-1

monitoring resources,

8-1

monitoring with Oracle Enterprise Manager,

8-1

resources monitoring,

8-1

Oracle Database QoS Management,

9-1

demand surges,

1-8

,

9-2

metrics,

9-3

open workloads,

1-8

resources

allocating,

9-2

Index-2

Oracle Database QoS Management (continued) resources (continued) waits,

9-3

response times

definition,

9-3

services,

9-2

wait times,

9-3

work requests,

1-8

metrics,

9-3

workloads

separating,

9-2

Oracle Enterprise Manager,

2-52

using the Interconnects page to monitor

Oracle Clusterware,

8-1

Oracle Grid Infrastructure,

4-1

Oracle Health Check Collections Manager,

2-33

bulk mapping systems to business units,

2-36

email notification system,

2-8

failed uploads,

2-40

incident tab,

2-46

incident tracking system,

2-27

purge old collections,

2-37

selectively capture users during logon,

2-34

upload collections automatically,

2-38

user-defined checks,

2-41

Oracle ORAchk and EXAchk command-line options,

2-29

,

A-1

daemon options,

A-8

file attribute changes,

2-29

baseline snapshot,

2-32

exclude directories,

2-31

file attribute snapshots,

2-30

include directories,

2-30

recheck changes,

2-31

remove snapshots,

2-33

restrict system checks,

2-32

snapshots,

A-9

generic commands,

A-3

managing the report output,

A-5

scope of checks,

A-4

uploading results to database,

A-6

Oracle ORAchk and Oracle EXAchk,

1-4

,

2-1

,

2-2

AUTORUN_FLAGS,

2-20

AUTORUN_INTERVAL,

2-22

AUTORUN_SCHEDULE,

2-19

,

2-22

collection_retention,

2-21

daemon,

2-15

get,

2-24

nodaemon,

2-15

NOTIFICATION_EMAIL,

2-20

,

2-21

PASSWORD_CHECK_INTERVAL,

2-21

sendemail,

2-16

set,

2-18

,

2-22

Index

Oracle ORAchk and Oracle EXAchk (continued) testemail,

2-20

troubleshoot,

2-54

Oracle ORAchk and Oracle EXAchk prerequisites

Expect utility,

2-7

handling of root passwords,

2-7

run as

Oracle Database home owner,

2-6

Oracle Grid Infrastructure home owner, root,

2-6

2-6

Oracle RAC,

6-4

Oracle RAC One Node,

6-4

Oracle RAC One Node database,

6-3

,

7-3

Oracle Real Application Clusters (Oracle RAC),

4-1

,

5-1

,

6-1

Oracle Trace File Analyzer,

1-6

,

4-6

automated diagnostic collections,

4-3

configuration,

4-13

configure hosts,

4-14

configure ports,

4-15

email notification,

4-11

managing Oracle Trace File Analyzer,

4-12

on-demand diagnostic collections,

4-5

custom collections changing the collection

name,

4-29

collecting incident packaging service packages,

4-30

copying zip files,

4-30

preventing collecting core

files,

4-30

silent collection,

4-30

specific components,

4-27

specific directories,

4-28

specific nodes,

4-27

trimming files,

4-30

default collections,

4-22

SRDC collections,

4-25

types,

4-5

products,

4-6

purge logs automatically,

4-34

restarting,

4-12

shutting down,

4-12

starting,

4-12

status,

4-13

stopping,

4-12

supported platforms,

4-6

TFACTL

command-line utility,

F-1

Oracle Trace File Analyzer architecture,

4-2

Oracle Trace File Analyzer Collector,

1-6

,

4-1

,

4-2

3

Oracle Trace File Analyzer Collector (continued) products,

4-6

supported platforms,

4-6

Oracle Trace File Analyzer log analyzer utility,

F-11

OSWatcher,

4-7

P

patch set updates,

4-34

performance issues

database client,

1-3

database server,

1-3

privileged user finding,

2-45

proactive notification,

1-4

PROCESSES node view,

B-2

PROTOCOL ERRORS node view,

B-2

R

remote login,

2-56

report findings,

2-3

report overview,

2-3

resources

monitoring,

D-1

S

schedule email health check reports,

1-4

self-signed certificate,

4-17

sensitivity,

7-3

shell,

4-2

silent mode operation exclude root access,

2-14

include root access,

2-14

skipped checks,

2-58

SRVCTL commands srvctl config cha,

5-10

srvctl status cha,

5-10

SSL protocols,

4-16

subsequent email,

2-5

sudo,

2-6

SYSTEM node view,

B-2

T

tabular format file output,

B-2

tfactl,

4-2

TFACTL

Index

TFACTL (continued) commands

tfactl access,

F-5

tfactl analyze,

F-11

tfactl changes,

F-8

tfactl collection,

F-35

tfactl diagcollect,

F-17

tfactl diagnosetfa,

F-3

tfactl directory,

F-20

tfactl events,

F-10

tfactl host,

F-3

tfactl ips,

F-22

tfactl ips ADD,

F-25

tfactl ips ADD FILE,

F-26

tfactl ips ADD NEW INCIDENTS

PACKAGE,

F-27

tfactl ips COPY IN FILE,

F-26

tfactl ips CREATE PACKAGE,

F-28

tfactl ips DELETE PACKAGE,

F-30

tfactl ips FINALIZE PACKAGE,

F-30

tfactl ips GENERATE PACKAGE,

F-30

tfactl ips GET MANIFEST FROM FILE,

F-31

tfactl ips GET METADATA,

F-31

tfactl ips GET REMOTE KEYS FILE,

F-28

tfactl ips PACK,

F-31

tfactl ips REMOVE,

F-27

tfactl ips REMOVE FILE,

F-27

tfactl ips SET CONFIGURATION,

F-33

tfactl ips SHOW CONFIGURATION,

F-33

tfactl ips SHOW FILES PACKAGE,

F-34

tfactl ips SHOW INCIDENTS PACKAGE,

F-34

tfactl ips SHOW PACKAGE,

F-33

tfactl ips SHOW PROBLEMS,

F-34

tfactl ips UNPACK FILE,

F-35

tfactl ips UNPACK PACKAGE,

F-35

tfactl ips USE REMOTE KEYS FILE,

F-28

tfactl managelogs,

F-38

tfactl print,

F-35

tfactl purge,

F-38

tfactl run,

F-14

tfactl set,

F-4

tfactl summary,

F-7

tfactl toolstatus,

F-15

Oracle Trace File Analyzer command-line utility,

F-1

tfactl analyze -since,

4-32

timeouts,

2-58

TLS protocols,

4-16

TOP CONSUMERS

node view,

B-2

Trace File Analyzer

Index-4

Trace File Analyzer (continued) disk usage snapshots,

4-34

trace levels,

4-35

trace logs,

7-4

Troubleshoot

EXAchk,

2-53

ORAchk,

2-53

U

unlockcells,

2-48

user add,

4-9

remove,

4-9

reset,

4-9

V

VMPScan,

2-47

Index

5