advertisement
System Event Log Troubleshooting Guide for Intel
®
Miscellaneous Events
11. Miscellaneous Events
The miscellaneous events section addresses sensors not easily grouped with other sensor types.
11.1 IPMI Watchdog
PCSD server systems support an IPMI watchdog timer, which can check to see whether the OS is still responsive. The timer is disabled by default, and has to be enabled manually. It then requires an IPMI-aware utility in the operating system that will reset the timer before it expires.
If the timer does expire, the BMC can take action if it is configured to do so (reset, power down, power cycle, or generate a critical interrupt).
Table 76: IPMI Watchdog Sensor Typical Characteristics
Byte Field
11 Sensor Type
12 Sensor Number
13 Event Direction and
Event Type
14 Event Data 1
Description
23h = Watchdog 2
03h
[7] Event direction
0b = Assertion Event
1b = Deassertion Event
[6:0] Event Type = 6Fh (Sensor Specific)
[7:6] – 11B = Sensor-specific event extension code in Event Data 2
[5:4] – 00b = Unspecified Event Data 3
[3:0] – Event Trigger Offset as describe in Table 77
Revision 1.1 Intel order number G74211-002 75
Miscellaneous Events
Byte Field
15 Event Data 2
Description
[7:4] – Interrupt type
0h = None
1h = SMI
2h = NMI
3h = Messaging Interrupt
Fh = Unspecified
All other = Reserved
[3:0] – Timer use at expiration
0h = Reserved
1h = BIOS FRB2
2h = BIOS/POST
3h = OS Load
4h = SMS/OS
5h = OEM
Fh = Unspecified
All other = Reserved
Not used 16 Event Data 3
Table 77: IPMI Watchdog Sensor Event Trigger Offset – Next Steps
Event Trigger Offset
Hex Description
00h Timer expired, status only
01h Hard reset
02h Power down
03h Power cycle
08h Timer interrupt
Description
Our server systems support a BMC watchdog timer, which can check to see whether the OS is still responsive. The timer is disabled by default, and has to be enabled manually. It then requires an IPMI-aware utility in the operating system that will reset the timer before it expires. If the timer does expire, the
BMC can take action if it is configured to do so (reset, power down, power cycle, or generate a critical interrupt).
Next Steps
If this event is being logged, it is because the BMC has been configured to check the watchdog timer.
1. Make sure you have support for this in your OS (typically using a third-party
IPMI-aware utility like ipmitool or ipmiutil along with the openipmi driver).
2. If this is the case, then it is likely your OS has hung, and you should investigate
OS event logs to determine what may have caused this.
76 Intel order number G74211-002 Revision 1.1
advertisement
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Related manuals
advertisement
Table of contents
- 11 Introduction
- 11 Purpose
- 11 Industry Standard
- 11 Intelligent Platform Management Interface (IPMI)
- 12 Baseboard Management Controller (BMC)
- 13 Intelligent Power Node Manager Version
- 14 Basic Decoding of a SEL Record
- 14 Default Values in the SEL Records
- 18 Sensor Cross Reference List
- 18 BMC owned Sensors (GID = 0020h)
- 22 BIOS POST owned Sensors (GID = 0001h)
- 22 BIOS SMI owned Sensors (GID = 0033h)
- 24 Hot Swap Controller Firmware owned Sensors (GID = 00C0h/00C2h)
- 25 Node Manager / ME Firmware owned Sensors (GID = 002Ch or 602Ch)
- 26 Microsoft* OS owned Events (GID = 0041)
- 26 Linux* Kernel Panic Events (GID = 0021)
- 27 Power Subsystems
- 27 Voltage Sensors
- 31 Power Unit
- 31 Power Unit Status Sensor
- 32 Power Unit Redundancy Sensor
- 34 Power Supply
- 34 Power Supply Status Sensors
- 35 Power Supply AC Power Input Sensors
- 36 Power Supply Current Output % Sensors
- 37 Power Supply Temperature Sensors
- 39 Cooling Subsystem
- 39 Fan Sensors
- 39 Fan Speed Sensors
- 40 Fan Presence and Redundancy Sensors
- 43 Temperature Sensors
- 43 Regular Temperature Sensors
- 45 Thermal Margin Sensors
- 46 Processor Thermal Control % Sensors
- 47 Discrete Thermal Sensors
- 49 Processor Subsystem
- 49 Processor Status Sensor
- 50 Catastrophic Error Sensor
- 51 Catastrophic Error Sensor – Next Steps
- 51 CPU Missing Sensor
- 52 CPU Missing Sensor – Next Steps
- 52 QuickPath Interconnect Error Sensors
- 52 QPI Correctable Error Sensor
- 53 QPI Non-Fatal Error Sensor
- 54 QPI Fatal and Fatal
- 56 Memory Subsystem
- 56 Memory RAS Mirroring and Sparing
- 56 Mirroring Configuration Status
- 57 Mirrored Redundancy State Sensor
- 59 Sparing Configuration Status
- 60 Sparing Redundancy State Sensor
- 63 ECC and Address Parity
- 63 Memory Correctable and Uncorrectable ECC Error
- 65 Memory Address Parity Error
- 68 PCI Express* and Legacy PCI Subsystem
- 68 PCI Express* Errors
- 68 PCI Express* Correctable Errors
- 69 PCI Express* Fatal Errors
- 71 Legacy PCI Errors
- 73 System BIOS Events
- 73 System Events
- 73 System Boot
- 73 Timestamp Clock Synchronization
- 74 System Firmware Progress (Formerly Post Error)
- 75 System Firmware Progress (Formerly Post Error) – Next Steps
- 81 Chassis Subsystem
- 81 Physical Security
- 81 Chassis Intrusion
- 81 LAN Leash Lost
- 83 FP (NMI) Interrupt
- 83 FP (NMI) Interrupt – Next Steps
- 84 Button Press Events
- 85 Miscellaneous Events
- 85 IPMI Watchdog
- 87 SMI Timeout
- 87 SMI Timeout – Next Steps
- 88 System Event Log Cleared
- 88 System Event – PEF Action
- 89 System Event – PEF Action – Next Steps
- 90 Hot Swap Controller Events
- 90 HSC Backplane Temperature Sensor
- 91 HSC Drive Slot Status Sensor
- 92 HSC Drive Slot Status Sensor – Next Steps
- 92 HSC Drive Presence Sensor
- 93 HSC Drive Presence Sensor – Next Steps
- 95 Manageability Engine (ME) Events
- 95 Node Manager Exception Event
- 96 Node Manager Exception Event – Next Steps
- 96 Node Manager Health Event
- 97 Node Manager Health Event – Next Steps
- 98 Node Manager Operational Capabilities Change
- 99 Node Manager Operational Capabilities Change – Next Steps
- 100 Node Manager Alert Threshold Exceeded
- 101 Node Manager Alert Threshold Exceeded – Next Steps
- 101 ME Firmware Health Event
- 102 ME Firmware Health Event – Next Steps
- 103 Microsoft Windows* Records
- 103 Boot-up Event Records
- 104 Shutdown Event Records
- 107 Bug Check / Blue Screen Event Records
- 109 Linux* Kernel Panic Records