Troubleshooting and Diagnosing Oracle Database 12.2 and

Troubleshooting and Diagnosing Oracle Database 12.2 and
Troubleshooting and Diagnosing Oracle
Database 12.2 and Oracle RAC
https://www.linkedin.com/in/raosandesh/
sandeshr
SandeshRao,SeniorDirector,RACDevelopment
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatement
Thefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedfor
informationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnota
commitmenttodeliveranymaterial,code,orfunctionality,andshouldnotbereliedupon
inmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesor
functionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
2
CommonQuestions
• HowdoIcontactyou?
– Linkedin – SandeshRao
– Email– [email protected]
• WheredoIgetyourpresentation?
– http://otnyathra.in/downloads/
• WhichbooksonRACdoIreadforbasicsorinternals?
– OracleDatabase11gOracleRealApplicationClustersHandbook,2ndEdition(OraclePress) 2ndEdition
– ProOracleDatabase11gRAConLinux(Expert'sVoiceinOracle) 2nded.Edition
– Oracle10gRACGrid,ServicesandClustering 1stEdition
– ProOracleDatabase10gRAConLinux:Installation,Administration,andPerformance(Expert'sVoicein
Oracle) 1stCorrecteded.,Corr.3rdprintingEdition
– OracleDatabase12cRelease2OracleRealApplicationClustersHandbook:Concepts,Administration,Tuning&
Troubleshooting(OraclePress) 1stEdition
– Documentation– AutonomousComputingGuide,RACAdminguide
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted
3
Agenda
• ArchitecturalOverview
• TroubleshootingScenarios
• ProactiveandReactivetools
• Q&A
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• GridInfrastructureisthenameforthecombinationof
– OracleClusterReadyServices(CRS)
– OracleAutomaticStorageManagement(ASM)
• TheGridHomecontainsthesoftwareforbothproducts
• CRScanalsobeStandaloneforASMand/orOracleRestart
• CRScanrunbyitselforincombinationwithothervendorclusterware
• GridHomeandRDBMShomemustbeinstalledindifferentlocations
– TheinstallerlockstheGridHomepathbysettingrootpermissions.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• CRSrequiressharedOracleClusterRegistry(OCR)andVotingfiles
– MustbeinASMorCFS
– OCRbackedupevery4hoursautomaticallyGIHOME/cdata
– Kept4,8,12hours,1day,1week
– Restoredwithocrconfig
– VotingfilebackedupintoOCRateachchange.
– Votingfilerestoredwithcrsctl
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• FornetworkCRSrequires
– One/multiplehighspeed,lowlatency,redundantprivatenetworkforinternode
communications
– Thinkofinterconnectasamemorybackplaneforthecluster
– Shouldbeaseparatephysicalnetwork ormanagedconvergednetwork
– VLANSaresupported
– Usedfor:•
•
•
•
Clusterwaremessaging
RDBMSmessagingandblocktransfer
ASMmessaging
HANFSforblocktraffic
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• OnlyonesetofClusterwaredaemonscanrunoneachnode
• TheCRSstackisspawnedfromOracleHAServicesDaemon(ohasd)
• OnUnixohasd runsoutofinittab withrespawn
• Anodecanbeevictedwhendeemedunhealthy
– MayrequirerebootbutatleastCRSstackrestart(rebootless restart)
– IPMIintegrationordiskmon incaseofExadata
• CRSprovidesClusterTimeSynchronizationservices
– Alwaysrunsbutinobservermodeifntpd configured
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
Agentschangeeverything
• Multi-threadedDaemons
• Managemultipleresourcesandtypes
• Implementsentrypointsformultipleresourcetypes
– Start,stop check,clean,fail
• oraagent,orarootagent,applicationagent,scriptagent,cssdagent
• SingleprocessstartedfrominitonUnix(ohasd)
• Diagrambelowshowsallcoreresources
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
Level2a
Level4a
Level3
Level0
Level4b
Level1
Level2b
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
InitScripts
• /etc/init.d/ohasd (locationO/Sdependent)
– RCscriptwith“start”and“stop”actions
– InitiatesOracleClusterware autostart
– ControlfilecoordinateswithCRSCTL
• /etc/init.d/init.ohasd (locationO/Sdependent)
– OHASDFrameworkScriptrunsfrominit/upstart
– ControlfilecoordinateswithCRSCTL
– NamedpipesyncswithOHASD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
• Level1:OHASDSpawns:
– cssdagent - AgentresponsibleforspawningCSSD
– orarootagent- Agentresponsibleformanagingallrootownedohasd resources
– oraagent - Agentresponsibleformanagingalloracleownedohasd resources
– cssdmonitor - MonitorsCSSDandnodehealth(alongwiththecssdagent)
• Level2a:OHASDrootagent spawns:
– CRSD- Primarydaemonresponsibleformanagingclusterresources.
– CTSSD- ClusterTimeSynchronizationServicesDaemon
– Diskmon (Exadata)
– ACFS(ASMClusterFileSystem)Drivers
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
• Level2b:OHASDoraagent spawns:
– MDNSD– MulticastDNSdaemon
– GIPCD– GridIPCDaemon
– GPNPD– GridPlugandPlayDaemon
– EVMD– EventMonitorDaemon
– ASM– ASMinstancestartedhereasmayberequiredbyCRSD
• Level3:CRSDspawns:
– orarootagent - Agentresponsibleformanagingallrootownedcrsd resources.
– oraagent - Agentresponsibleformanagingallnonroot ownedcrsd resources.
• OneisspawnedforeveryuserthathasCRSresourcestomanage.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
Startup Sequence
• Level4:CRSDoraagent spawns:
– ASMResouce - ASMInstance(s)resource(proxyresource)
– Diskgroup- Usedformanaging/monitoringASMdiskgroups.
– DBResource- UsedformonitoringandmanagingtheDBandinstances
– SCANListener- Listenerforsingleclientaccessname,listeningonSCANVIP
– Listener- NodelistenerlisteningontheNodeVIP
– Services- Usedformonitoringandmanagingservices
– ONS- OracleNotificationService
– eONS - EnhancedOracleNotificationService(pre11.2.0.2)
– GSD- For9ibackwardcompatibility
– GNS(optional)- GridNamingService- Performsnameresolution
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
OracleFlexCluster
Thestandardgoingforward
(everyOracle12c Rel.2cluster
isaFlexClusterbydefault.)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
15
UndertheHood:AnyNewInstallEndsUpinaFlexCluster
[GRID]>crsctl getclustername
CRS-6724:Currentclusternameis'SolarCluster'
[GRID]>crsctl getclusterclass
CRS-41008:Clusterclassis'StandaloneCluster'
[GRID]>crsctl getclustertype
CRS-6539:Theclustertypeis'flex'.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
16
Private
Network
1
2
Database
MemberCluster
UseslocalASM
Cluster Domain
3
4
Application
MemberCluster
Database
MemberCluster
Database
MemberCluster
GIonly
UsesIO&ASM
ServiceofDSC
UsesASM
Service
SAN
DomainServicesCluster
NAS
Mgmt
Repository
(GIMR)
Service
TraceFile
Analyzer
(TFA)
Service
RapidHome
Provisioning
(RHP)
Service
Additional
Optional
Services
ASM
Service
IOService
SharedASM
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
17
ASMFlexDiskgroups 1
Database-orientedStorageManagementformoreflexibilityandavailability
Pre-12.2diskgroup Organization
Diskgroup
DB1:File1
DB3:File3
DB3:File1
DB2:File1
DB2:File2
DB1:File3
DB3:File2
DB2:File3
DB2:File4
DB1:File2
Sharedresource
management
12.2FlexDiskgroup Organization
Database-oriented
resourcemanagement
FileGroup
FlexDiskgroup
DB1
DB2
DB3
File1
File2
File3
File1
File2
File3
File4
File1
File2
File3
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| OracleConfidential– Internal/Restricted/HighlyRestricted
18
ASMFlexDiskgroups 2
Database-orientedStorageManagementformoreflexibilityandavailability
12.2FlexDiskgroup Organization
• FlexDiskgroups enable
FlexDiskgroup
Quota
DB1
DB2
DB3
File1
File2
File3
File1
File2
File3
File4
File1
File2
File3
DB3
File1
File2
– QuotaManagement- limitthespace
databasescanallocateinadiskgroup and
therebyimprovethecustomers’abilityto
consolidatedatabasesintofewerDGs
– RedundancyChange– utilizelower
redundancyforlesscriticaldatabases
– ShadowCopies(“splitmirrors”)toeasily
anddynamicallycreatedatabaseclones
fortest/dev orproductiondatabases
File3
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| OracleConfidential– Internal/Restricted/HighlyRestricted
19
NodeWeightinginOracleRAC12c Release2
Idea:Everythingequal,letthemajorityofworksurvive
1
✔
2
• NodeWeightingisanewfeaturethatconsiders
theworkloadhostedintheclusterduringfencing
• Theideaistoletthemajorityofworksurvive,
ifeverythingelseisequal
– Example:Ina2-nodecluster,thenodehostingthe
majorityofservices(atfencingtime)ismeanttosurvive
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
20
CSS_CRITICAL– FencingwithManualOverride
Nodeeviction
despiteWL;WL
willfailover.
srvctl modifydatabase-help
|grep critical
…
-css_critical {YES|NO}
Definewhetherthedatabase
orservice isCSScritical
“Conflict”.
✔
crsctl setserver
css_critical {YES|NO}
+serverrestart
CSS_CRITICAL
canbesetonvariouslevels/
componentstomarkthemas
“critical”sothattheclusterwilltryto
preservethemincaseofafailure.
CSS_CRITICALwillbehonored
ifnoothertechnicalreasonprohibits
survivalofthenodewhichhasat
leastonecriticalcomponentatthe
timeoffailure.
Afallbackschemeisappliedif
CSS_CRITICALsettingsdonotleadto
anactionableoutcome.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
21
ProvenFeatures– EvenMoreBeneficialontheDSC
AutonomousHealthFramework
(poweredbymachinelearning)
worksmoreefficientlyforyouonthe
DSC,ascontinuousanalysisistaken
offtheproductioncluster.
TheDSCistheidealhosting
environmentforRapidHome
Provisioning(RHP)enablingsoftware
fleetmanagement.
OracleASM12c Rel.2basedstorage
consolidation isbestperformedon
theDSC,asitenablesnumerous
additionalfeaturesandusecases.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
22
Node Eviction Basics
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Basic RAC Cluster with Oracle Clusterware
Public Lan
Public Lan
Private Lan /
Interconnect
CSSD
CSSD
SAN
Network
Voting
Disk
CSSD
SAN
Network
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
What does CSSD do?
CSSD monitors and evicts nodes
• Monitors nodes using 2 communication channels:
– Private Interconnect ó Network Heartbeat
– Voting Disk based communication ó Disk Heartbeat
• Evicts (forcibly removes nodes from a cluster)
nodes dependent on heartbeat feedback (failures)
CSSD
“Ping”
CSSD
“Ping”
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Network Heartbeat
Interconnect basics
• Each node in the cluster is “pinged” every second
• Nodes must respond in css_misscount time (defaults to 30 secs.)
– Reducing the css_misscount time is generally not supported
• Network heartbeat failures will lead to node evictions
– CSSD-log: [date / time] [CSSD][1111902528]clssnmPollingThread: node
mynodename (5) at 75% heartbeat fatal, removal in 6.770 seconds
CSSD
“Ping”
CSSD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Disk Heartbeat
Voting Disk basics – Part 1
• Each node in the cluster “pings” (r/w) the Voting Disk(s) every second
• Nodes must receive a response in (long / short) diskTimeout time
– I/O errors indicate clear accessibility problems à timeout is irrelevant
• Disk heartbeat failures will lead to node evictions
– CSSD-log: … [CSSD] [1115699552] >TRACE:
clssnmReadDskHeartbeat:
node(2) is down. rcfg(1) wrtcnt(1) LATS(63436584) Disk lastSeqNo(1)
CSSD
CSSD
“Ping”
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Voting Disk Structure
Voting Disk basics – Part 2
• Voting Disks contain dynamic and static data:
– Dynamic data: disk heartbeat logging
– Static data: information about the nodes in the cluster
• With 11.2.0.1 Voting Disks got an “identity”:
– E.g. Voting Disk serial number: [GRID]> crsctl query css votedisk
1.
2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]
• Voting Disks must therefore not be copied using “dd” or “cp” anymore
Node information
Disk Heartbeat Logging
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
“Simple Majority Rule”
Voting Disk basics – Part 3
• Oracle supports redundant Voting Disks for disk failure protection
• “Simple Majority Rule” applies:
– Each node must “see” the simple majority of configured Voting Disks
at all times in order not to be evicted (to remain in the cluster)
Ø trunc(n/2+1) with n=number of voting disks configured and n>=1
CSSD
CSSD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Insertion 1: “Simple Majority Rule”…
… In extended Oracle clusters
• http://www.oracle.com/goto/rac
– Using standard NFS to support
a third voting file for extended
cluster configurations (PDF)
CSSD
CSSD
• Same principles apply
• Voting Disks are just
geographically dispersed
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Insertion 2: Voting Disk in Oracle ASM
The way of storing Voting Disks doesn’t change its use
[GRID]> crsctl query css votedisk
1.
2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]
2.
2 aafab95f9ef84f03bf6e26adc2a3b0e8 (/dev/sde5) [DATA]
3.
2 28dd4128f4a74f73bf8653dabd88c737 (/dev/sdd6) [DATA]
Located 3 voting disk(s).
• Oracle ASM auto creates 1/3/5 Voting Files
– Based on Ext/Normal/High redundancy
and on Failure Groups in the Disk Group
– Per default there is one failure group per disk
– ASM will enforce the required number of disks
– New failure group type: Quorum Failgroup
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Why are nodes evicted?
è To prevent worse things from happening…
• Evicting (fencing) nodes is a preventive measure (a good thing)!
• Nodes are evicted to prevent consequences of a split brain:
– Shared data must not be written by independently operating nodes
– The easiest way to prevent this is to forcibly remove a node from the cluster
1
CSSD
2
CSSD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
How are nodes evicted?
EXAMPLE: Heartbeat failure
• The network heartbeat between nodes has failed
– It is determined which nodes can still talk to each other
– A “kill request” is sent to the node(s) to be evicted
§ Using all (remaining) communication channels à Voting Disk(s)
• A node is requested to “kill itself”; executer: typically CSSD
1
CSSD
CSSD
2
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Re-bootless Node
Fencing (restart)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Re-bootless Node Fencing (restart)
Fence the cluster, do not reboot the node
• Until Oracle Clusterware 11.2.0.2, fencing meant “re-boot”
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because:
– Re-boots affect applications that might run an a node, but are not protected
– Customer requirement: prevent a reboot, just stop the cluster – implemented...
Standalone
App X
Oracle RAC
DB Inst. 1
CSSD
Standalone
App Y
Oracle RAC
DB Inst. 2
CSSD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Re-bootless Node Fencing (restart)
How it works
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
– Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
• Then IO issuing processes are killed; it is made sure that no IO process remains
– For a RAC DB mainly the log writer and the database writer are of concern
Standalone
App X
Oracle RAC
DB Inst. 1
CSSD
Standalone
App Y
CSSD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Re-bootless Node Fencing (restart)
EXCEPTIONS
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less, unless…:
–
–
–
–
IF the check for a successful kill of the IO processes fails → reboot
IF CSSD gets killed during the operation → reboot
IF cssdmonitor is not scheduled → reboot
IF the stack cannot be shutdown in “short_disk_timeout”-seconds → reboot
Standalone
App X
Oracle RAC
DB Inst. 1
CSSD
Standalone
App Y
Oracle RAC
DB Inst. 2
CSSD
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
TroubleshootingScenarios
ClusterStartup ProblemTriage(11.2+)
Startup
Sequence
ps –ef|grep init.ohasd
ps –ef|grep ohasd.bin
Running?
NO
crsctl config crs
ohasd.log
Obvious?
NO
EngageOracleSupport
EngageSysadminTeam
TFACollector
YES
YES
EngageSysadminTeam
ClusterStartup
DiagnosticFlow
ps –ef|grep cssdagent
ps –ef|grep ocssd.bin
ps –ef|grep orarootagent
ps –ef|grep ctssd.bin
ps –ef|grep crsd.bin
ps –ef|grep cssdmonitor
ps –ef|grep oraagent
ps –ef|grep ora.asm
ps –ef|grep gpnpd.bin
ps –ef|grep mdnsd.bin
ps –ef|grep evmd.bin
Crsctl checkcrs
Crsctl checkcluster
Engage
OracleSupport
SysadminTeam
TFACollector
Running?
NO
ohasd.log
agentlogs
processlogs
TFACollector
ohasd.log
OLRperms
Comparereferencesystem
Obvious?
YES
Engage
SysadminTeam
NO
YES
NO
Obvious?
YES
Engage
SysadminTeam
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Engage
OracleSupport
SysadminTeam
Troubleshooting Scenarios
Cluster Startup Problem Triage
• MulticastDomainNameServiceDaemon(mDNS(d))
– UsedbyGridPlugandPlaytolocateprofilesinthecluster,aswellasbyGNStoperform
nameresolution.ThemDNS processisabackgroundprocessonLinuxandUNIXandon
Windows.
– Usesmulticastforcacheupdatesonserviceadvertisementarrival/departure.
– Advertises/servesonallfoundnodeinterfaces.
– LogisGI_HOME/log/<node>/mdnsd/mdnsd.log
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Cluster Startup Problem Triage
<?xmlversion="1.0"encoding="UTF-8"?>
<gpnp:GPnP-ProfileVersion="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"xmlns:gpnp="http://www.gridpnp.org/2005/11/gpnp-profile"xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile
gpnp-profile.xsd"ProfileSequence="6" ClusterUId="b1eec1fcdd355f2bbf7910ce9cc4a228"ClusterName="staij-cluster"
PALocation="">
<gpnp:Network-Profile><gpnp:HostNetwork id="gen"HostName="*">
<gpnp:Network id="net1"IP=”192.168.1.0"Adapter="eth0"Use="public"/>
<gpnp:Network id="net2"IP=”192.168.2.0"Adapter="eth1“Use="cluster_interconnect"/>
</gpnp:HostNetworkcss"></gpnp:Network-Profile>
<orcl:CSS-Profileid="DiscoveryString="+asm"LeaseDuration="400"/>
<orcl:ASM-Profileid="asm"DiscoveryString=""SPFile="+SYSTEM/staij-cluster/asmparameterfile/registry.253.693925293"/>
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod
Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethodAlgorithm="http://www.w3.org/2001/10/xmlexc-c14n#"><InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#"PrefixList="gpnp orcl
xsi"/></ds:Transform></ds:Transforms><ds:DigestMethod
Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>x1H9LWjyNyMn6BsOykHhMvxnP8U=</ds:DigestValue
></ds:Reference></ds:SignedInfo><ds:SignatureValue>N+20jG4=</ds:SignatureValue></ds:Signature>
</gpnp:GPnP-Profile>
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Cluster Startup Problem Triage
• cssdagentandmonitor
– Samefunctionalityinbothagentandmonitor
– Functionalityofseveralpre-11.2daemonsconsolidatedinboth
• OPROCD– systemhang
• OMON– oracleclusterwaremonitor
• VMON– vendorclusterwaremonitor
– Runrealtime withlockeddownmemory,likeCSSD
– Providesenhancedstabilityanddiagnosability
– Logsare
• GI_HOME/log/<node>/agent/oracssdagent_root/oracssdagent_root.log
• GI_HOME/log/<node>/agent/oracssdmonitor_root/oracssdmonitor_root.log
• 12c– ORACLE_BASE/diag/node/agent/..
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
NodeEvictions
NHB?
Eviction
Scenario
1531223.1
1328466.1
Systemlog
Resource
Starvation?
NO
YES
Obvious?
YES
Engage
networking team
NO
NO
YES
Freememory?
CPUload?
NodeResponse?
Clusteralert
ocssd.log
1050693.1
1534949.1
1546004.1
TFACollector
DHB?
Engage
appropriate
team
1549428.1
1466639.1
Engagestorage
team
NO
YES
YES
Obvious?
NodeEviction
DiagnosticFlow
Engage
Oracle
Support
NO
Resolved?
YES
NO
Fenced?
YES
NO
Resourcestarvation
Engage
sysadmin
team
NO
TFACollector
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
YES
MissingNetworkHeartbeat(1)
• ocssd.logfromnode1
• ===>sendingnetworkheartbeatsothernodes.Normally,thismessageisoutputonceevery5messages(seconds)
• 2016-08-1317:00:20.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:20.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Thenetworkheartbeatisnotreceivedfromnode2(drrac2)for15consecutiveseconds.
• ===>Thismeansthat15networkheartbeatsaremissingandisthefirstwarning(50%threshold).
• 2016-08-1317:00:22.818:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at50%heartbeatfatal,removalin14.520
seconds
• 2016-08-1317:00:22.818:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)isimpendingreconfig,flag132108,
misstime 15480
• ===>continuingtosendthenetworkheartbeatsandlogmessagesonceevery5messages
• 2016-08-1317:00:25.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:25.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>75%thresholdofmissingnetworkheartbeatisreached.Thisissecondwarning.
• 2016-08-1317:00:29.833:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at75%heartbeatfatal,removalin7.500
seconds
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(2)
• ===>continuingtosendthenetworkheartbeatsandlogmessagesonceevery5messages
• 2016-08-1317:00:30.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:30.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>continuingtosendthenetworkheartbeats,butthemessageisloggedafter4messages
• 2016-08-1317:00:34.021:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:34.021:[CSSD][4096109472]clssnmSendingThread:sent4statusmsgs toallnodes
• ===>Lastwarningshowsthat90%thresholdofthemissingnetworkheartbeatisreached.
• ===>Theevictionwilloccurin2.49seconds.
• 2016-08-1317:00:34.841:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at90%heartbeatfatal,removalin
2.490seconds,seedhbimpd 1
• ===>Evictionofnode2(drrac2)started
• 2016-08-1317:00:37.337:[CSSD][4106599328]clssnmPollingThread:Removalstartedfornodedrrac2(2),flags0x2040c,
state3,wt4c0
• ===>Thisshowsthatthenode2isactivelyupdatingthevotingdisks
• 2016-08-1317:00:37.340:[CSSD][4085619616]clssnmCheckSplit:Node2,drrac2,isalive,DHB(1281744040,1396854)
morethandisktimeoutof27000afterthelastNHB(1281744011,1367154)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(3)
• ===>Evictingnode2(drrac2)
• 2016-08-1317:00:37.340:[CSSD][4085619616](:CSSNM00007:)clssnmrEvict:Evictingnode2,drrac2,fromtheclusterin
incarnation169934272,nodebirthincarnation169934271,deathincarnation169934272,stateflags 0x24000
• ===>Reconfiguredtheclusterwithoutnode2
• 2016-08-1317:01:07.705:[CSSD][4043389856]clssgmCMReconfig:reconfigurationsuccessful,incarnation169934272with1
nodes,localnodenumber1,masternodenumber1
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(4)
• ocssd.logfromnode2:
• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes
• 2016-08-1317:00:26.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:26.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Firstwarningofreaching50%thresholdofmissingnetworkheartbeats
• 2016-08-1317:00:26.213:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at50%heartbeatfatal,removalin14.540
seconds
• 2016-08-1317:00:26.213:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)isimpendingreconfig,flag394254,
misstime 15460
• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes
• 2016-08-1317:00:31.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:31.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Secondwarningofreaching75%thresholdofmissingnetworkheartbeats
• 2016-08-1317:00:33.227:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at75%heartbeatfatal,removalin7.470
seconds
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(5)
• ===>Loggingthemessagetoindicate4networkheartbeatsaresent
• 2016-08-1317:00:35.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:35.009:[CSSD][4062550944]clssnmSendingThread:sent4statusmsgs toallnodes
• ===>Thirdwarningofreaching90%thresholdofmissingnetworkheartbeats
• 2016-08-1317:00:38.236:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at90%heartbeatfatal,removalin
2.460seconds,seedhbimpd 1
• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes
• 2016-08-1317:00:40.008:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:40.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Evictionstartedfornode1(drrac1)
• 2016-08-1317:00:40.702:[CSSD][4073040800]clssnmPollingThread:Removalstartedfornodedrrac1(1),flags0x6040e,
state3,wt4c0
• ===>Node1isactivelyupdatingthevotingdisk,sothisisasplitbraincondition
• 2016-08-1317:00:40.706:[CSSD][4052061088]clssnmCheckSplit:Node1,drrac1,isalive,DHB(1281744036,1243744)
morethandisktimeoutof27000afterthelastNHB(1281744007,1214144)
• 2016-08-1317:00:40.706:[CSSD][4052061088]clssnmCheckDskInfo:Mycohort:2
• 2016-08-1317:00:40.707:[CSSD][4052061088]clssnmCheckDskInfo:Survivingcohort:1
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(6)
• ===>Node2isabortingitselftoresolvethesplitbrainandensuretheclusterintegrity
• 2016-08-1317:00:40.707:[CSSD][4052061088](:CSSNM00008:)clssnmCheckDskInfo:Abortinglocalnodetoavoidsplitbrain.
Cohortof1nodeswithleader2,drrac2,issmallerthancohortof1nodesledbynode1,drrac1,basedonmaptype2
• 2016-08-1317:00:40.707:[CSSD][4052061088]###################################
• 2016-08-1317:00:40.707:[CSSD][4052061088]clssscExit:CSSDaborting fromthreadclssnmRcfgMgrThread
• 2016-08-1317:00:40.707:[CSSD][4052061088]###################################
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(7)
•
Observations
1.
Bothnodesreportedmissingheartbeatsatthesametime
2.
Bothnodessentheartbeatstoothernodesallthetime
3.
Node2aborteditselftoresolvesplitbrain
•
Conclusion
1.
Thisislikelyanetworkproblem,engagenetworkteam
2.
CheckOSWatcheroutput (netstat andtraceroute)
1.
Configureprivate.net file,notconfiguredbydefault
3.
CheckCHM
4.
Checksystemlog
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(1)
ocssd.log:
===>Thefirsterrorindicatingthatitcouldnotreadvotingdisk-- firstmessagetoindicatea
problemaccessingthevotingdisk
2016-08-1318:31:19.787:[SKGFD][4131736480]ERROR:-9(Error27072,OSError(Linux
Error:5:Input/outputerror
Additionalinformation:4
Additionalinformation:721425
Additionalinformation:-1)
)
2016-08-1318:31:19.787:[CSSD][4131736480](:CSSNM00060:)clssnmvReadBlocks:read
failedatoffset529of/dev/sdb8
2016-08-1318:31:19.802:[CSSD][4131736480]clssnmvDiskAvailabilityChange:votingfile
/dev/sdb8nowoffline
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(2)
====>Theerrormessagethatshowsaproblemaccessingthevotingdiskrepeatsonceevery4seconds
2016-08-1318:31:23.782:[CSSD][150477728]clssnmvDiskOpen:Opening/dev/sdb8
2016-08-1318:31:23.782:[SKGFD][150477728]Handle0xf43fc6c8fromlib:UFS::fordisk:/dev/sdb8:
2016-08-1318:31:23.782:[CLSF][150477728]Openedhdl:0xf4365708fordev:/dev/sdb8:
2016-08-1318:31:23.787:[SKGFD][150477728]ERROR:-9(Error27072,OSError(LinuxError:5:
Input/outputerror
Additionalinformation:4
Additionalinformation:720913
Additionalinformation:-1)
)
2016-08-1318:31:23.787:[CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset17
of/dev/sdb8
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(3)
====>Thelasterrorthatshowsaproblemaccessingthevotingdisk.
====>Notethatthelastmessageis200secondsafterthefirstmessage
====>becausethelongdisktimeout is200seconds
2016-08-1318:34:37.423:[CSSD][150477728]clssnmvDiskOpen:Opening/dev/sdb8
2016-08-1318:34:37.423:[CLSF][150477728]Openedhdl:0xf4336530fordev:/dev/sdb8:
2016-08-1318:34:37.429:[SKGFD][150477728]ERROR:-9(Error27072,OSError(LinuxError:5:
Input/outputerror
Additionalinformation:4
Additionalinformation:720913
Additionalinformation:-1)
)
2016-08-1318:34:37.429:[CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset17
of/dev/sdb8
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(4)
====>Thismessageshowsthatocssd.bintriedaccessingthevotingdiskfor200seconds
2016-08-1318:34:38.205:[CSSD][4110736288](:CSSNM00058:)clssnmvDiskCheck:NoI/Ocompletionsfor
200880msforvotingfile/dev/sdb8)
====>ocssd.binabortsitselfwithanerrormessagethatthemajorityofvotingdisksarenotavailable.In
thiscase,therewasonlyonevotingdisk,butifthreevotingdiskswereavailable,aslongastwo
votingdisksareaccessible,ocssd.binwillnotabort.
2016-08-1318:34:38.206:[CSSD][4110736288](:CSSNM00018:)clssnmvDiskCheck:Aborting,0of1
configuredvotingdisksavailable,need1
2016-08-1318:34:38.206:[CSSD][4110736288]###################################
2016-08-1318:34:38.206:[CSSD][4110736288]clssscExit:CSSDabortingfromthread
clssnmvDiskPingMonitorThread
2016-08-1318:34:38.206:[CSSD][4110736288]###################################
•
Conclusion
Thevotingdiskwasnotavailable,engagestorageteam
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Node Eviction Triage
• Timesynchronisationissue
• ClusterTimeSynchronisationServicesdaemon
– ProvidestimemanagementinaclusterforOracle.
• ObservermodewhenVendortimesynchronisations/wisfound
– LogstimedifferencetotheCRSalertlog
• ActivemodewhennoVendortimesyncs/wisfound
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Node Eviction Triage
• ClusterReadyServicesDaemon
– TheCRSDdaemonisprimarilyresponsibleformaintainingtheavailabilityofapplication
resources,suchasdatabaseinstances.CRSDisresponsibleforstartingandstoppingthese
resources,relocatingthemwhenrequiredtoanothernodeintheeventoffailure,and
maintainingtheresourceprofilesintheOCR(OracleClusterRegistry).Inaddition,CRSDis
responsibleforoverseeingthecachingoftheOCRforfasteraccess,andalsobackingupthe
OCR.
– LogfileisGI_HOME/log/<node>/crsd/crsd.log
• Rotationpolicy10-50M
• Retentionpolicy10logs
• Dynamicin12.1andcanbechanged
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Node Eviction Triage
• CRSDoraagent
– CRSD’soraagent manages
• alldatabase,instance,serviceanddiskgroupresources
• nodelisteners
• SCANlisteners,andONS
–
IftheGridInfrastructureownerisdifferentfromtheRDBMShomeownerthenyouwould
have2oraagents eachrunningasoneoftheinstallationowners.Thedatabase,andservice
resourceswouldbemanagedbytheRDBMShomeownerandotherresourcesbytheGrid
Infrastructurehomeowner.
– Logfileis
• GI_HOME/log/<node>/agent/crsd/oraagent_<user>/oraagent_<user>.log
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Node Eviction Triage
• CRSDorarootagent
– CRSD’srootagent manages
• GNSandit’sVIP
• NodeVIP
• SCANVIP
• networkresources.
– Logfileis
• GI_HOME/log/<node>/agent/crsd/orarootagent_root/oraagent_root.log
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Node Eviction Triage
• Agentreturncodes
– Checkentrymustreturnoneofthefollowingreturncodes:
• ONLINE
• UNPLANNED_OFFLINE
– Target=online,mayberecoveredfailedover
• PLANNED_OFFLINE
• UNKNOWN
– Cannotdetermine,ifpreviouslyonline,partialthenmonitor
• PARTIAL
– Someofaresourcesservicesareavailable.Instanceupbutnotopen.
• FAILED
– Requirescleanaction
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting Scenarios
Automatic Diagnostic Repository (ADR)
§ Importantlogsandtraces
§ 11.2– DatabasesonlyuseADR
• GridInfrastructurefilesin$GI_HOME/log/<node_name>/<component_name>
– $GI_HOME/log/myHost/cssd
– $GI_HOME/log/myHost/alertmyHost.log
§ 12c – GridInfrastructureandDatabaseuseADR
§ DifferentlocationsforGridInfrastructureandDatabases
§ GridInfrastructure
• Alert.log,cssd.log,csrd.log,etc
§ Databases
§
Alert.log,backgroundprocesstraces,foregroundprocesstraces
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle’sDatabaseandClusterwareTools
• Whatifissuesweredetectedbeforethey
hadanimpact?
Hang
Manager
• Whatifyouwerenotifiedwithaspecific
diagnosisandcorrectiveactions?
• Whatifresourcebottlenecksthreatening
SLAswereidentifiedearly?
Trace File
Analyzer
Quality of
Service
Management
Cluster
Health
Advisor
EXAchk
• Whatifbottleneckscouldbe
automaticallyrelievedjustintime?
Memory
Guard
• Whatifdatabasehangsandnodereboots
couldbeeliminated?
Cluster
Health
Monitor
ORAchk
Cluster
Verification
Utility
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
60
MaintainsCompliance
withBestPracticesand
AlertsVulnerabilitiesto
KnownIssues
Oracle12cORAchk&EXAchk
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
61
WhyOracle ORAchk&EXAchk
Automaticproactivewarning
ofproblemsbeforethey
impactyou
Getscheduledhealthreports
senttoyouinemail
Healthchecksformostimpactful
reoccurringproblems
Engineered
Systems
EXAchk
Runsinyourenvironment
withnoneedtosend
anythingtoOracle
Findingscanbeintegrated
intoothertoolsofchoice
CommonFramework
Non
Engineered
Systems
ORAchk
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
62
OracleStackCoverage
• OracleEngineeredSystems
• OracleDatabaseAppliance
o OracleExadataDatabaseMachine
o OracleSuperCluster/MiniCluster
o OraclePrivateCloudAppliance
o OracleBigDataAppliance
o OracleExalogicElasticCloud
o OracleExalyticsIn-MemoryMachine
o OracleZeroDataLossRecoveryAppliance
• OracleASR
• OracleSystems
• OracleSolaris
• Crossstackchecks
• SolarisCluster
• OVN
• Oracle Database
• OracleE-BusinessSuite
• StandaloneDatabase
• GridInfrastructure&RAC
• OraclePayables
• OracleWorkflow
• Maximum AvailabilityArchitecture(MAA)
Scorecard
• OraclePurchasing
• Upgrade ReadinessValidation
• OracleProcessManufacturing
• Golden Gate
• OracleRestart
• OracleEnterpriseManagerCloudControl
• Repository
• Agent
• OracleOrderManagement
• OracleReceivables
• OracleFixedAssets
• OracleHCM
• OracleCRM
• OracleProjectBilling
• OMS
• OracleSiebel
• OracleMiddleware
• ApplicationContinuity
• OracleIdentifyandAccessManagement
Suite(OracleIAM)
• Databasebestpractices
• OraclePeopleSoft
• Databasebestpractices
• OracleSAP
• EXAdatabestpractices
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
63
Profiles
Profile
asm
avdf
clusterware
control_VM
• Profilesprovidelogicalgroupingof
checks whichareaboutsimilartopics
• Runonlychecksinaspecificprofile
./exachk –profile <profile>
• Runeverythingexceptchecksinaspecific
profile
./exachk –excludeprofile <profile>
Description
ASMChecks
AuditVaultConfigurationchecks
Oracleclusterwarechecks
ChecksonlyforControlVM(ec1-vm,ovmm,db,pc1,pc2).
Nocrossnodechecks
corroborate
Exadatachecksneedsfurtherreviewbyusertodetermine
passorfail
dba
DBAChecks
ebs
OracleE-BusinessSuitechecks
eci_healthchecks
EnterpriseCloudInfrastructureHealthchecks
ecs_healthchecks EnterpriseCloudSystemHealthchecks
goldengate
OracleGoldenGatechecks
hardware
HardwarespecificchecksforOracleEngineeredsystems
maa
MaximumAvailabilityArchitectureChecks
ovn
OracleVirtualNetworking
platinum
Platinumcertificationchecks
preinstall
Pre-installationchecks
prepatch
Checkstoexecutebeforepatching
security
Securitychecks
solaris_cluster
SolarisClusterChecks
storage
OracleStorageServerChecks
switch
Infinibandswitchchecks
sysadmin
Sysadminchecks
user_defined_checks Runuserdefinedchecksfromuser_defined_checks.xml
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
64
Profiles
• Profilesprovidelogicalgroupingof
checks whichareaboutsimilartopics
• Runonlychecksinaspecificprofile
./orachk –profile <profile>
• Runeverythingexceptchecksinaspecific
profile
./orachk –excludeprofile <profile>
Profile
asm
bi_middleware
clusterware
dba
ebs
emagent
emoms
em
goldengate
hardware
oam
oim
oud
ovn
peoplesoft
preinstall
prepatch
security
siebel
solaris_cluster
storage
switch
sysadmin
user_defined_checks
Description
ASMChecks
OracleBusinessIntelligencechecks
Oracleclusterware checks
DBAChecks
OracleE-BusinessSuitechecks
Cloudcontrolagentchecks
CloudControlmanagementserver
Cloudcontrolchecks
OracleGoldenGate checks
HardwarespecificchecksforOracleEngineeredsystems
OracleAccessManagerchecks
OracleIdentifyManagerchecks
OracleUnifiedDirectoryserverchecks
OracleVirtualNetworking
Peoplesoft bestpractices
Pre-installationchecks
Checkstoexecutebeforepatching
Securitychecks
SiebelChecks
SolarisClusterChecks
OracleStorageServerChecks
Infiniband switchchecks
Sysadmin checks
Runuserdefinedchecksfromuser_defined_checks.xml
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
65
KeepTrackofChangestotheAttributesofImportantFiles
• Trackchangestotheattributesofimportantfileswith–fileattr
– Looksatallfiles&directorieswithinGridInfrastructureandDatabasehomesbydefault
– Thelistofmonitoreddirectoriesandtheircontentscanbeconfiguredtoyourspecificrequirements
– Use–fileattrstarttostartthefirstsnapshot ./orachk –fileattr start
$ ./orachk -fileattr start
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/11.2.0.4/grid?[y/n][y]
Checking ssh user equivalency settings on all nodes in cluster
Node mysrv22 is configured for ssh user equivalency for oradb user
Node mysrv23 is configured for ssh user equivalency for oradb user
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
orachk has taken snapshot of file attributes for above directories at:
/orahome/oradb/orachk/orachk_mysrv21_20170504_041214
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
66
KeepTrackofChangestotheAttributesofImportantFiles
• Comparecurrentattributesagainstfirstsnapshotusing–fileattrcheck
./orachk –fileattr check
$ ./orachk -fileattr check -includedir "/root/myapp/config" -excludediscovery
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/12.2.0/grid?[y/n][y]
Checking for prompts on myserver18 for oragrid user...
Checking ssh user equivalency settings on all nodes in cluster
Node myserver17 is configured for ssh user equivalency for root user
List of directories(recursive) for checking file attributes:
/root/myapp/config
• Resultsofsnapshotcomparisonwillalso
beshownintheHTMLreportoutput
Checking file attribute changes...
.
"/root/myapp/config/myappconfig.xml" is different:
Baseline :
0644
oracle
root /root/myapp/config/myappconfig.xml
Current
0644
root
root /root/myapp/config/myappconfig.xml
:
…etc
…etc
Note:
•
•
•
Usethesameargumentswithcheckthatyouusedwithstart
Willproceedtoperformstandardhealthchecksafterattributechecking
FileAttributeChangeswillalsoshowinHTMLreportoutput
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
67
ImproveperformanceofSQLqueries
• Manynewchecksfocusonknownissuesin12c
OptimizeraswellasSQLPlanManagement
Allcontainedinthedba profile:
-profile dba
• Thesecheckstargetproblemssuchas:
– Wrongresultsreturned
– Highmemory&CPUusage
– ErrorssuchasORA-00600orORA-07445
– Issueswithcursorusage
– OthergeneralSQLplanmanagementproblems
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
OracleDatabaseSecurityAssessmentTool(DBSAT)included
• DBSATanalyzes
database
configurationsand
securitypolicies
• Uncoverssecurity
risks
• Improvesthesecurity
postureofOracle
Databases
Allresultsincludedwithinreportoutputunderthecheck:
Validatedatabasesecurityconfigurationusingdatabasesecurityassessmenttool
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
UpgradetoDatabase12.2withconfidence
• Newcheckstohelpwhenupgradingthedatabase
to12.2
• Bothpreandpostupgradeverificationtoprevent
problemsrelatedto:
• OSconfiguration
• GridInfrastructure&Databasepatchprerequisites
• Databaseconfiguration
• Clusterconfiguration
Preupgrade
-u –o pre
Postupgrade
-u –o post
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
OracleHealthChecksCollectionManager
• NewCollectionManager
appbuiltonAPEX5
theme
• Tabsreplacedwithdrop
downmenusforeasier
navigation
• ORAchk&EXAchk
continuetoshipwith
APEX4apptoo
• Nomorenew
functionalityintheAPEX
4app,allnewfeatures
willgointotheAPEX5
app
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
71
EnterpriseManagerIntegration
•Relatedchecksgroupedinto
compliancestandards
•Viewtargetschecked,violations&
averagescore
•Drilldownintocompliancestandard
toseeindividualcheckresults
•Viewbreakdownbytarget
•CheckresultsintegratedintoEM
complianceframeworkviaplugin
•ViewresultsinnativeEM
compliancedashboards
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
72
Provision
• UseEnterpriseManagerprovisioning
featureandselectORAchk/EXAchk
• Afterselectedthiswilllaunchthe
provisioningwizard,choose thesystem
type
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
73
ViewResultsbyComplianceStandard
FilterbyExachk%”
Drillintoapplicablestandardandview
individualchecks&targetstatus
Clickindividualchecksfor
recommendationdetails
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
74
JSONOutputtoIntegratewithKibana,ElasticSearchetc
• TheJSONprovidesmanytagsto
allowdashboardfilteringbasedon
factssuchas:
•
•
•
•
•
•
•
•
•
EngineeredSystemtype
EngineeredSystemversion
Hardwaretype
Nodename
OSversion
Rackidentifier
Racktype
Databaseversion
Andmore...
• Kibanacanbeusedtoviewhealth
checkcomplianceacrossyourdata
center
• Resultscanalsobefilteredbased
onanycombinationofexposed
systemattributes
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
75
JSONOutputtoIntegratewithKibana,ElasticSearchetc
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
76
SpeedsIssueDiagnosis,
TriageandResolution
Oracle12cTraceFileAnalyzer
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal
77
WhyTFA?
Providesoneinterfacefor
alldiagnosticneeds
Collectsdataacrossthe
clusterandconsolidatesit
inoneplace
Collectsallrelevant
diagnosticdataatthetime
oftheproblem
Reducestimerequiredto
obtaindiagnosticdata,
whichsavesyourbusiness
money
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal
78
SupportedPlatformsandVersions
• AllOracleDatabase&Gridversions
10.2+aresupported
• AllmajorOperatingSystems are
supported
– Linux(OEL,RedHat,SUSE,Itanium&
zLinux)
– OracleSolaris(SPARC&x86-64)
– AIX
– HPUX(Itanium&PA-RISC)
– Windows
• YouprobablyalreadyhaveTFA
installedasitisincludedwith:
OracleGrid
Infrastructure
11.2.0.4+
12.1.0.2+
12.2.0.1+
OracleDatabase
12.2.0.1+
• Updatedquarterlyvia1513912.1
OSversionssupportedarethesameasthosesupportedbytheDatabase
JavaRuntimeEdition1.8required
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
79
Linux/UnixInstallation
Root/DaemonInstall
Nonroot/NonDaemonInstall
1.
Downloadfrom1513912.1
1.
Downloadfrom1513912.1
2.
Copytoonerequiredmachineandunzip
2.
Copytoeveryrequiredmachineandunzip
3.
Run
3.
Run
./installTFA<platform>
Will:
–
–
–
Installonallnodes
AutodiscoverrelevantOracleSoftware&Exadata
StorageServers
Startmonitoringforproblems&performauto
collections
Will:
./installTFA<platform>
-extractto <install_dir>
-javahome <jre_home>
– Onlyinstalloncurrenthost
– Notdoautomaticcollections
– Notcollectfromremotehosts
– Notcollectfilesunreadablebyinstalluser
Recommendedinstalllocation:/opt/oracle.tfa
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
80
Architecture
•
Remote
Node
n
TFA
Remote
Daemon
Node
2
Scripts
Cluster
TFA
Daemon
Remote
Node
1
Scripts
Scripts
TFAdaemonrunsoneachcluster
node
•
TFA
Daemon
TFA
Daemon
Alerts&
Logfiles
Scripts
Alerts&
Logfiles
tfactl
InitiatorNode
(Wherecommandoriginated)
Cluster
wide
Collection
Orsingleinstancewhenno
GridInfrastructureisused
•
Commandlinecommunicationis
viatfactlcommand
•
TFADaemonsonallnodes
coordinate:
• Scriptexecution
• Collectionofdiagnostics
• Trimmingoflogcontents
•
Clusterwidecollectionoutputis
consolidatedononenode
Thedaemonisonlyusedwheninstalledasroot
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
81
AutomaticDiagnosticCollections
OracleTraceFileAnalyzer
1
OracleGridInfrastructure
&Database(s)
Automatically
detectevent
2
Collect&package
relevant
diagnostics
Significant
problemoccurs
DBA(s)/SysAdmin(s)
4
Uploadcollection
toOracleSupport
forfurtherhelp
3
Notify
relevantDBAand
orSysAdminby
email
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
82
CommandInterfaces
Commandline
• Specifyallcommandoptionsat
thecommandline
tfactl <command>
Shell
1.
Setandchangecontext
2.
Runcommandsfromwithin
theshell
Menu
1.
Selectmenunavigation
optionsthenchoosethe
commandyouwanttorun
tfactl menu
tfactl
tfaclt > database MyDB
MyDB tfactl > oratop
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
83
Maintain
• Option1
• Option2
– ApplyingstandardPSUswill
automaticallyupdateTFA
– PSUsdonotcontainSupportTools
Bundleupdates
– ToupdatewithlatestTFA&Support
ToolsBundle
1.
2.
Downloadlatestversion:1513912.1
Repeatthesameinstallationsteps
Upgradetothelatestversionwheneverpossibletoincludebugfixes,newfeatures&optimizations
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
84
ViewSystem&ClusterSummary
Chooseanoptiontodrill
downfurther
Quicksummaryofstatusof
keycomponents
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
85
SummaryASMDrillDownExample
ASMOverview
ASMclusterwidesummary
Problemsfound
ASMClusterwidestatus
Problemsfoundonmyserver69
Alsodiskspacewarningonbothservers
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
86
SummaryASMDrillDownExample
ViewASMproblemsformyserver69
Viewnodewise&drillinto
myserver69
ViewASMstatussummary
formyserver69
Viewrecentproblemsdetected
Viewcomponentstatus
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
87
InvestigateLogs&LookforErrors
• Analyzeallimportantrecentlogentries:
tfactl analyze –last 1d
• Searchrecentlogentries:
tfactl analyze -search “ora-00600" -last 8h
Searchingfor
“ora-00600”
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
88
PerformAnalysisUsingtheIncludedTools
Tool
orachkor
exachk
oswatcher
procwatcher
oratop
sqlt
alertsummary
ls
pstack
Description
ProvideshealthchecksfortheOraclestack.
OracleTraceFileAnalyzerwillinstalleither
• OracleEXAchkforEngineeredSystems,seedocument1070954.1for
moredetails
or
• OracleORAchkforallnon-EngineeredSystems,seedocument
1268927.2 formoredetails
CollectsandarchivesOSmetrics.Theseareusefulforinstanceornode
evictions&performanceIssues.Seedocument301137.1formoredetails
Automates&capturesdatabaseperformancediagnosticsandsessionlevel
hanginformation.Seedocument459694.1 formoredetails
Tool
Description
grep
Searchalertortracefileswithagivendatabaseandfilenamepattern,for
asearchstring.
summary
Provideshighlevelsummaryoftheconfiguration
vi
Opensalertortracefilesforviewingagivendatabaseandfilename
patterninthevieditor
tail
Runsatailonanalertortracefilesforagivendatabaseandfilename
pattern
param
ShowsalldatabaseandOSparametersthatmatchaspecifiedpattern
dbglevel
SetsandunsetsmultipleCRStracelevelswithonecommand
Providesnearreal-timedatabasemonitoring.Seedocument1500864.1
formoredetails.
history
Showstheshellhistoryforthetfactlshell
changes
CapturesSQLtracedatausefulfortuning.Seedocument215187.1 for
moredetails.
Reportschangesinthesystemsetupoveragiventimeperiod.This
includesdatabaseparameters,OSparametersandpatchesapplied
calog
ReportsmajoreventsfromtheClusterEventlog
events
Reportswarningsanderrorsseeninthelogs
ProvidessummaryofeventsforoneormoredatabaseorASMalertfiles
fromallnodes
ListsallfilesTFAknowsaboutforagivenfilenamepatternacrossallnodes
Generateprocessstackforspecifiedprocessesacrossallnodes
NotalltoolsareincludedinGridorDatabaseinstall.
Downloadfrom1513912.1 togetfullcollectionoftools
managelogs ShowsdiskspaceusageandpurgesADRlogandtracefiles
ps
triage
Findsprocesses
Summarizeoswatcher/exawatcher data
Verifywhichtoolsyouhaveinstalled: tfactl toolstatus
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
89
OSWatcher(SupportToolsBundle)
Collect&ArchiveOSMetrics
• ExecutesstandardUNIXutilities(e.g.vmstat,iostat,ps,
etc)onregularintervals
• BuiltinAnalyzerfunctionalitytosummarize,graphand
reportuponcollectedmetrics
• OutputisRequiredfornoderebootandperformance
issues
• Simpletoinstall,extremelylightweight
• RunsonALLplatforms(ExceptWindows)
• MOSNote:301137.1 – OSWatcherUsersGuide
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
90
Procwatcher (SupportToolsBundle)
Monitor&ExamineDatabaseProcesses
• Singleinstance&RAC
• Generatessessionwait,lockandlatchreportsaswellascallstacks
fromanyproblemprocess(s)
• AbilitytocollectstacktracesofspecificprocessesusingOracleTools
andOSDebuggers
• TypicallyreducesSRresolutionforperformancerelatedissues
• RunsonALLmajorUNIXPlatforms
• MOSNote:459694.1 – Procwatcher InstallGuide
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
91
oratop (SupportToolsBundle)
NearReal-TimeDatabaseMonitoring
•
•
•
•
Singleinstance&RAC
Monitoringcurrentdatabaseactivities
Databaseperformance
Identifyingcontentionsandbottleneck
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
92
Analyze
• Eachtoolcanberunusingtfactlinshellmode
• Starttfactlshellwith
tfactl
• Runatoolwiththetoolname
tfactl > orachk
1. Wherenecessarysetcontextwithdatabase<dbname>
2. Thenruntool
tfactl > database MyDB
MyDB tfactl > oratop
3. Clearcontextwithdatabase
MyDB tfactl > database
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
93
OneCommandSRDCs
• Forcertaintypesofproblems
OracleSupportwillaskyouto
runaServiceRequestData
Collection(SRDC)
• Previouslythiswouldhave
involved:
• Readingmanydifferent
supportdocuments
• Collectingoutputfrom
manydifferenttasks
• Gatheringlotsofdifferent
diagnostics
• Packaging&uploading
• Nowjustrun:
tfactl diagcollect -srdc <srdc_type>
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
94
Faster&EasierSRDataCollection
tfactl diagcollect –srdc <srdc_type>
TypeofProblem
CollectionScope
SRDC Types
ORA-00600
ORA-00700
ORA-04030
ORA-04031
ORA-07445
ORAErrors
•
•
•
•
•
Otherinternaldatabaseerrors
• internalerror
Localonly
Databaseperformanceproblems
• dbperf
Clusterwide
Databasepatchingproblems
• dbpatchinstall New
• dbpatchconflict New
Localonly
Databaseinstall/upgradeproblems
• dbinstall New
• dbupgrade New
Localonly
EnterpriseManagertablespaceusagemetricproblems
• emtbsmetrics
EnterpriseManagergeneralmetricspageorthreshold
problems- RunallthreeSRDCs
• emdebugon
• emdebugoff
• ORA-27300
• ORA-27301
• ORA-27302
New
New
• emmetricalert
New
New
Localonly
Localonly(onEMAgenttarget)
Localonly(onEMAgenttarget&OMS)
Localonly(onEMAgenttarget&RepositoryDB)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
95
OneCommandSRDCs– ExamplesofWhat’sCollected
ORA4031:
tfactl diagcollect –srdc ora4031
1.
2.
3.
4.
5.
IPSPackage
PatchListing
AWRreport
Memoryinformation
RDA
DatabasePerformance
tfactl diagcollect –srdc dbperf
1.
2.
3.
4.
5.
6.
ADDMreport
AWRforgoodandproblemperiod
AWRComparePeriodreport
ASHreportforgoodandproblemperiod
OSWatcher
IPSPackage(iferrorsduringproblem
period)
7. ORAchk(performancerelatedchecks)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
96
ManualDataGatheringvs OneCommandSRDC
ManualDataGathering
TFASRDC
1. GenerateADDMreviewingDocument1680075.1
1. Run tfactl diagcollect –srdc dbperf
2. Identify“good”and“problem”periodsandgatherAWR
reviewingDocument1903158.1
2. UploadresultingzipfiletoSR
3. GenerateAWRcomparereport(awrddrpt.sql)using“good”
and“problem”periods
4. GenerateASHreportfor“good”and“problem”periods
reviewingDocument1903145.1
5. CollectOSWatcherdatareviewingDocument301137.1
6. Checkalert.logifthereareanyerrorsduringthe“problem”
period
7. Findanytracefilesgeneratedduringthe“problem”period
8. Collateanduploadalltheabovefiles/outputstoSR
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
97
OneCommandSRDC
InteractiveMode
tfactl diagcollect –srdc <srdc_type>
1.
Enterdefaultforeventdate/timeanddatabasename
2.
Scanssystemtoidentifyrecent10eventsinthesystem(ORA600
exampleshown)
3.
Oncetherelevanteventischosen,proceedswithdiagnostic
collection
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
4.
Allrequiredfilesare
identified
5.
Trimmedwhere
applicable
6.
Packageinazipready
toprovidetosupport
98
OneCommandSRDC
SilentMode
tfactl diagcollect –srdc <srdc_type> -database <db> -for <time>
1.
Parameters(date/time,DBname)areprovided
inthecommand
2.
Doesnotpromptforanymoreinformation
3.
Allrequiredfilesareidentified
4.
Trimmedwhereapplicable
5.
Packageinazipreadytoprovidetosupport
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
99
DefaultCollection
• Runadefaultdiagnostic
collectionifthereisnot
yetanSRDCaboutyour
problem:
tfactl diagcollect
• Willtrim&collectall
importantlogfiles
updatedinthepast12
hours:
• Collectionsstoredinthe
repository directory
• Changediagcollect
timeframewith
–last<n>h|d
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
100
AutomaticDatabaseLogPurge
• TFAcanautomaticallypurgedatabaselogs
– OFFbydefault
– ExceptonaDomainServiceCluster(DSC),
whichitisONbydefault
• Turnautopurgingonoroff:
tfactl set manageLogsAutoPurge=<ON|OFF>
• Willremovelogsolderthan30days
– configurablewith: tfactl
set manageLogsAutoPurgePolicyAge=<n><d|h>
• Purgingrunsevery60minutes
– configurablewith:
tfactl set manageLogsAutoPurgeInterval=<minutes>
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
101
ManualDatabaseLogPurge
• TFAcanmanageADRlogandtracefiles
– Showdiskspaceusageofindividualdiagnosticdestinations
– Purgethesefiletypesbasedondiagnosticlocationandorage:
• "ALERT“,"INCIDENT“,"TRACE“,"CDUMP“,"HM“,"UTSCDMP“,"LOG“
tfactl managelogs <options>
Option
Description
–showusage
ShowsdiskspaceusageperdiagnosticdirectoryforbothGIanddatabaselogs
-showvariation–older<n><m|h|d>
Usetodetermineperdirectorydiskspacegrowth.
Showsthediskusagevariationforthespecifiedperiodperdirectory.
-purge–older<n><m|h|d>
RemoveallADRfilesundertheGI_BASEdirectory,whichareolderthanthetimespecified
–gi
RestrictcommandtoonlydiagnosticfilesundertheGI_BASE
–database[all|dbname]
Restrictcommandtoonlydiagnosticfilesunderthedatabasedirectory.Defaultstoall,
alternativelyspecifyadatabasename
-dryrun
Usewith–purgetoestimatehowmanyfileswillbeaffectedandhowmuchdiskspacewillbe
freedbyapotentialpurgecommand.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
RunsastheADRhome
owner.Sowillonlybeable
topurgefilesthisowner
haspermissiontodelete
Maytakeawhilefora
largenumberoffiles
102
ManualDatabaseLogPurge
tfactl managelogs –show variation –older <n><m|h|d>
tfactl managelogs –show usage
Use-gi toonly
showgrid
infrastructure
Use–database toonly
showdatabase
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
103
ManualDatabaseLogPurge
tfactl managelogs –purge –older n<m|h|d> -dryrun
tfactl managelogs –purge –older n<m|h|d>
Use–dryrun
fora“whatif”
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
104
DiskUsageSnapshots
• TFAwilltrackdiskusageandrecordsnapshotsto:
– tfa/repository/suptools/<node>/managelogs/usage_snapshot/
• Snapshothappensevery60minutes,configurablewith:
tfactl set diskUsageMonInterval=<minutes>
• DiskusagemonitoringisONbydefault,configurablewith:
tfactl set diskUsageMon=<ON|OFF>
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
105
Collect
• Trim&collectallimportantlogfilesupdatedin
thepast12hours: tfactl diagcollect
• CollectaproblemspecificServiceRequestData
Collection(SRDC): tfactl diagcollect -srdc ora600
• Collectionsstoredintherepository directory
• Changediagcollecttimeframewith–since<n>h|d
• Forlistoftypesofsrdc collectionsusetfactldiagcollect-srdc help
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
106
TFAdbglevel profiles
• Example
– tfactl dbglevel -setnode_eviction
– wouldbeusedforenhancingdiagnosticswhennode evictions arethebeing
investigatedandwouldperformthefollowingoperationinternally
•
•
•
•
•
crsctl setlogcss "CSSD=4"
crsctl setlogcss "CSSDNMC=4"
crsctl setlogcss "CLSF=4"
crsctl setlogcss "CSSDGMCC=4"
crsctl setlogcss "CSSDGMPC=4"
• Toreverttotheoriginalordefaultlogginglevelsthefollowingcommand
– $tfactl dbglevel -unsetnode_eviction
• wouldperformthefollowingoperationsinternally
•
•
•
•
•
crsctl setlogcss "CSSD=2"
crsctl setlogcss "CSSDNMC=2"
crsctl setlogcss "CLSF=0"
crsctl setlogcss "CSSDGMCC=2"
crsctl setlogcss "CSSDGMPC=2"
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| OracleConfidential– Internal/Restricted/HighlyRestricted
107
IncidentBasedCollectionswithSRDC
IncidentType
ora4030
ora4031
dbperf
ora600
ora700
ora7445
Description
ForORA-04030errors
ForORA-04031errors
Forbasicdbperformanceproblems
For ORA-00600errors
For ORA-00700errors
For ORA-07445errors
Usesrdc <incidenttype>: tfactl srdc ora4030
Tospecifysid use–sid <oraclesid>
Tospecifydatabaseuse–db<dbname>
Tospecifyincidentdate&timeuse
–inc_date <YYYY-MM-DD>-inc_time <HH:MM:SS>
• TouploaddirectlytotheSRuse–sr<SR#>
•
•
•
•
tfactl srdc ora4030 -sid orcl –db RDBMS121 \
-inc_date 2016-06-15 -inc_time 02:48:23 \
-sr 3-123456789
• Fordbperf usetheseparametersto
specifythegood&badperformance
periodstocompare:
Parameter
perf_base_sd
perf_base_st
perf_base_ed
perf_base_et
perf_comp_sd
perf_comp_st
perf_comp_ed
perf_comp_et
Description
Startdateforagoodperformanceperiod
Starttimeforagoodperformanceperiod
Enddateforagoodperformanceperiod
Endtimeforagoodperformanceperiod
Startdateforabadperformanceperiod
Starttimeforabadperformanceperiod
Enddateforabadperformanceperiod
Endtimeforabadperformanceperiod
tfactl srdc dbperf –db RDBMS121 \
–perf_base_sd 2016-06-15 –perf_base_st
–perf_base_ed 2016-06-15 –perf_base_et
–perf_comp_sd 2016-06-16 –perf_comp_st
–perf_comp_ed 2016-06-16 –perf_comp_et
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
OracleConfidential– Internal
01:30:00 \
02:00:00 \
09:30:00 \
10:00:00
108
GeneratesDiagnostic
MetricsViewofCluster
andDatabases
Oracle12cClusterHealthMonitor
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential–
Confidential–OracleRestricted
OracleInternal/Restricted/HighlyRestricted
109
ClusterHealthMonitor(CHM)
GeneratesDiagnosticMetricsViewofClusterandDatabases
• Alwayson- Enabledbydefault
• ProvidesDetailed OSResourceMetrics
OSData
• AssistsNodeevictionanalysis
• Locallylogsallprocessdata
• Usercandefinepinnedprocesses
OSData
• Supportsplug-incollectors(ex.
traceroute,netstat,ping,etc.)
OSData
osysmond
OSData
osysmond
osysmond
ologgerd
(master)
• ListenstoCSSandGIPCevents
• Categorizesprocessesbytype
osysmond
GIMR
12cGridInfrastructure
ManagementRepository
• NewCSVoutputforeaseofanalysis
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential–
Confidential–OracleRestricted
OracleInternal/Restricted/HighlyRestricted
110
ClusterHealthMonitor(CHM)
Oclumon CLIorFullIntegrationwithEMCloudControl
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential–
Confidential–OracleRestricted
OracleInternal/Restricted/HighlyRestricted
111
DiscoversPotentialCluster
&DBProblems- Notifies
withCorrectiveActions
Oracle12cClusterHealthAdvisor
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential–
Confidential–OracleRestricted
OracleInternal/Restricted/HighlyRestricted
112
ClusterHealthMonitor(CHM)
GeneratesDiagnosticMetricsViewofClusterandDatabases
• Alwayson- Enabledbydefault
• ProvidesDetailed OSResourceMetrics
OSData
• AssistsNodeevictionanalysis
• Locallylogsallprocessdata
• Usercandefinepinnedprocesses
OSData
• Supportsplug-incollectors(ex.
traceroute,netstat,ping,etc.)
OSData
osysmond
OSData
osysmond
osysmond
ologgerd
(master)
• ListenstoCSSandGIPCevents
• Categorizesprocessesbytype
osysmond
GIMR
12cGridInfrastructure
ManagementRepository
• NewCSVoutputforeaseofanalysis
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential–
Confidential–OracleRestricted
OracleInternal/Restricted/HighlyRestricted
113
CHAhasdetectedaservicedegradationduetohigherthanexpectedI/Olatencies.
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
114
CHA has detected a service degradation due to higher than expected I/O latencies.
Cluster Health Advisor
CHA/DB Health
CHA detected a for service degradation due to higher than expected I/O latencies.
CHA/DB Health: I/O problem
Cluster Health Advisor
Problem
Confidence
Action
The degradation is caused by a higher than expected utilization of shared storage devices for this
database. No evidence of significant increase in I/O demand on the local node.
95.17%
Validate whether there is increase in I/O demand on other nodes than the local and find I/O intensive SQL .
Add more disks to disk group or move database to faster disks.
proddb_1
proddb_2
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Confidential– OracleRestricted
115
ClusterHealthAdvisorDaemon
DependenciestotheGridInfrastructure
ManagementRepository(GIMR)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted
116
CommandLineTool- chactl
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted
117
ClusterHealthAdvisor
Willonlymonitorcluster
initially
Tellittomonitorthe
database
chactl monitor database –db <db_name>
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted
118
ClusterHealthAdvisor- diagnosis
Querytheclusterdiagnosisfor
incidentsandrecommendations
chactl query diagnosis
Queryaspecificdatabasefor
diagnosis
chactl query diagnosis –db <db_name>
Querytherepositoryfootprint
chactl query repository
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted
119
AutonomouslyPreserves
DatabaseAvailabilityand
Performance
Oracle12cDatabaseHangManager
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential–
Confidential–OracleRestricted
OracleInternal/Restricted/HighlyRestricted
120
DebuggingLiveSystems:Hangs
• Parsingthesystemstatedumpcanbeverytimeconsuming.
Todebugahangmorequicklyyoucouldqueryv$session.
blocking_session:
select sess.sid sid,substr(proc.program,0,25)
prog,substr(sw.event,0,15) event,sw.wait_time wt,
sess.blocking_session bsid from v$process proc, v$session sess,
v$session_wait sw where proc.addr=sess.paddr and
sess.status='ACTIVE‘ and sw.sid=sess.sid order by prog;
SID Program
Event
WT BSID
----- ------------------------- --------------- --- ----2836 [email protected] (S000) enq: TM - conte 0 2979
2690 [email protected] (S001) enq: TM - conte 0 2979
2531 [email protected] (S002) enq: TM - conte 0 2979
2811 [email protected] (S003) enq: TM - conte 0 2979
2979 [email protected] (TNS V1- enq: TM - conte 0 2853
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
DebuggingLiveSystems:Hangs
• sqlplus –prelim“/assysdba”isusefulbecauseitavoidsa
processstateobjectcreationwhichrequiresvarious
resourcessuchaslatches.
• Tryingtoacquirethoseresourcesmaycauseyourdebugger
sessiontohang.
• Somedumps/commandsmayrequireaPSOthereforeyou
canexecutethosedumps/commandsinanexistingprocess
thatalreadyhasaPSO
$sqlplus -prelim"/assysdba"
SQL>oradebug setorapid 9
SQL>oradebug dumpsystemstate3
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle12cHangManager
AutonomouslyPreservesDatabaseAvailabilityandPerformance
Session
• Alwayson- Enabledbydefault
• Reliablydetectsdatabasehangsand
deadlocks
DETECT
• Autonomouslyresolvesthem
EVALUATE
• SupportsQoSPerformanceClasses,Ranks
andPoliciestomaintainSLAs
• Logsalldetectionsandresolutions
• NewSQLinterfacetoconfiguresensitivity
(Normal/High)andtracefilesizes
Hung?
ANALYZE
QoS
Policy
DIA0
VERIFY
Victim
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
123
Oracle12cHangManager
FullResolutionDumpTraceFileandDBAlertLogAuditReports
Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
Oracle Database 12c Enterprise Edition Release 12.2.0.0.0 - 64bit Beta
2015-10-13T16:47:59.435039+17:00
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):
and Real Application Testing options
ORA-32701: Possible hangs up to hang ID=1 detected
Build label: RDBMS_MAIN_LINUX.X64_151013
Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc
ORACLE_HOME: …/3775268204/oracle
2015-10-13T16:47:59.506775+17:00
System name:
Linux
DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2
Node name: slc05kyr
due to a GLOBAL, HIGH confidence hang with ID=1.
Release:
2.6.39-400.211.1.el6uek.x86_64
Hang Resolution Reason: Automatic hang resolution was performed to free a
Version:
#1 SMP Fri Nov 15 13:39:16 PST 2013
significant number of affected sessions.
Machine:
x86_64
DIA0:
Examine the alert log on instance 2 for session termination status of hang with ID=1.
VM name:
Xen Version: 3.4 (PVM)
Instance name: hm62
In the alert log on the instance local to the session (instance 2 in this case),
Redo thread mounted by this instance: 2
we see the following:
Oracle process number: 19
Unix process pid: 12656, image: [email protected] (DIA0)
2015-10-13T16:47:59.538673+17:00
Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):
ORA-32701: Possible hangs up to hang ID=1 detected
*** 2015-10-13T16:47:59.541509+17:00
Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
*** SESSION ID:(96.41299) 2015-10-13T16:47:59.541519+17:00
*** CLIENT ID:() 2015-10-13T16:47:59.541529+17:00
*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-13T16:47:59.541538+17:00 2015-10-13T16:48:04.222661+17:00
DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1
*** MODULE NAME:() 2015-10-13T16:47:59.541547+17:00
requested by master DIA0 process on instance 1
*** ACTION NAME:() 2015-10-13T16:47:59.541556+17:00
Hang Resolution Reason: Automatic hang resolution was performed to free a
*** CLIENT DRIVER:() 2015-10-13T16:47:59.541565+17:00
significant number of affected sessions.
by terminating session sid:40 with serial # 43179 (ospid:13031)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
124
DeployswithMinimum
FootprintandMaximum
Manageability
OracleDomainServicesCluster(DSC)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
125
Oracle12cDomainServicesCluster(DSC)
DeployswithMinimumFootprintandMaximumManageability
ORACLECLUSTERDOMAIN
Application
Member
Cluster
• HostsFrameworkasServices
• Reduceslocalresourcefootprint
• Centralizesmanagement
Application
Member
Cluster
• Speedsdeploymentandpatching
• OptionalSharedStorage
• Supportsmultipleversionsand
platformsgoingforward
Database
Member
Cluster
Database
Member
Cluster
OracleDomainServicesCluster
Database
Member
Cluster
Database
Member
Cluster
Management Repository Service
Trace File Analyzer Receiver
ORAchk Collection Service
Grid Names Service
Storage Services
Rapid Home Provisioning Service
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
126
OracleClusterDomain
Database
MemberCluster
Private
Network
UseslocalASM
Application
MemberCluster
Database
MemberCluster
Database
MemberCluster
GIonly
UsesIO&ASM
ServiceofDSC
UsesASM
Service
SAN
OracleDomainServicesCluster
NAS
Mgmt
Repository
(GIMR)
Service
TraceFile
Analyzer
(TFA)
Service
RapidHome
Provisioning
(RHP)
Service
Additional
Optional
Services
ACFS
Services
ASM
Service
IOService
SharedASM
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
127
Oracle12cDomainServicesCluster(DSC)
DeployswithMinimumFootprintandMaximumManageability
ORACLECLUSTERDOMAIN
Application
Member
Cluster
• HostsFrameworkasServices
• Reduceslocalresourcefootprint
• Centralizesmanagement
Application
Member
Cluster
• Speedsdeploymentandpatching
• OptionalSharedStorage
• Supportsmultipleversionsand
platformsgoingforward
Database
Member
Cluster
Database
Member
Cluster
OracleDomainServicesCluster
Database
Member
Cluster
Database
Member
Cluster
Management Repository Service
Trace File Analyzer Receiver
ORAchk Collection Service
Grid Names Service
Storage Services
Rapid Home Provisioning Service
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleRestricted
128
CompareDatabaseStatusBefore&AfterUpgrade
• Downloaddbupgdiag.sqlfrom doc556610.1
• Runbothbeforeandaftertheupgrade:
cd <location of the script>
$ sqlplus / as sysdba
sql> alter session set
nls_language='American';
sql> @dbupgdiag.sql
sql> exit
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| Confidential– OracleInternal/Restricted/HighlyRestricted
129
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement