RTAS Error and Event Classes
describes the predefined classes of
error and event notifications that can be presented through the
check-exception and
event-scan RTAS functions. More detailed descriptions
of these classes are given later in this chapter.
defines nodes in the OF device
tree which, through an
“interrupts” property, may list the
platform-dependent interrupts related to each class. From this information,
OSs know which interrupts may be handled by calling
check-exception. The OF structure for describing these
interrupts is defined in
.
This document also defines the mask parameter for the
check-exception and
event-scan RTAS functions which limits the search for
errors and events to the classes specified.
Error and Event Classes with RTAS Function Call
Mask
Class Type
OF Node Name(where the
“interrupts” property lists the
interrupts)
RTAS Function Call Mask(value = 1 enables
class)
Internal Errors
internal-errors
bit 0
Environmental and Power Warnings
epow-events
bit 1
Reserved
bit 2
Hot Plug Events
hot-plug-events
bit 3
I/O Events and Errors
ibm,io-events
bit 4
R1--1.
For the Platform Interrupt Event option: The platform
must implement the I/O Events and Errors class type along with the
appropriate
ibm,io-events node property to specify the
interrupts.
R1--2.
Platform-specific error and event interrupts
that a platform provider wants the OS to enable must be listed in the
“interrupts” property of the appropriate OF
event class node, as described in
.
R1--3.
To enable
platform-specific error and event interrupt notification, OSs must find the
list of interrupts (described in
) for each error and event class in the
OF device tree, and enable them.
R1--4.
OSs must have interrupt handlers for the
enabled interrupts described in Requirement
, which call the RTAS
check-exception function to determine the cause of the
interrupt.
R1--5.
Platforms which
support error and event reporting must provide information to the OS via
the RTAS
event-scan and
check-exception functions, using the reporting format
described in
.
R1--6.
Optional Extended Error Log information, if
returned by the
event-scan or
check-exception functions, must be in the reporting
format described in
.
R1--7.
To provide control
over performance, the RTAS event reporting functions must not perform any
event data gathering for classes not selected in the event class mask
parameter, nor any extended data gathering if the time critical parameter
is non-zero or the log buffer length parameter does not allow for an
extended error log.
R1--8.
To prevent the loss of any event
notifications, the RTAS event reporting functions must be written to gather
and process error and event data without destroying the state information
of events other than the one being processed.
R1--9.
Any interrupts or interrupt controls used for
error and event notification must not be shared between error and event
classes, or with any other types of interrupt mechanisms. This allows the
OS to partition its interrupt handling and prevents blocking of one class
of interrupt by the processing of another.
R1--10.
If a platform chooses to report multiple
event or error sources through a single interrupt, it must ensure that the
interrupt remains asserted or is re-asserted until
check-exception has been used to process all
outstanding errors or events for that interrupt.
Platform Implementation Note: In Requirement
, although the fixed-part return format
for
check-exception and
event-scan is the same, there are some expectations
about what types of error response may be returned from these functions, as
follows:
The
event-scan function is mainly intended to report only
errors that have been recovered or are non-critical to the OS, since it is
only called on a periodic basis. As such, it should never be used to report
a Severity greater than “WARNING”. More critical errors should
be signaled by an interrupt. Typically, the expected response of an OS to
an
event-scan error report is simply to log the error. The
check-exception function may report error information
of any severity.
If
event-scan is reporting a critical error (for example,
a checkstop) that occurred before the current boot session, it should not
report it with a “FATAL” Severity, even though the condition
was fatal at the time the failure occurred. The Severity field informs the
OS of the severity of the event at the time of reporting. Errors which
occurred before a successful reboot are no longer critical. Likewise, the
RTAS Disposition field for such an error should be
“FULLY_RECOVERED”. There is a bit in the extended error log to
indicate these “residual” errors.
Although
check-exception can potentially clean up an error and
return a “FULLY_RECOVERED” disposition, recovery still may not
occur if the MSR
RI bit is not set to 1. It is up to the OS to examine
the RI bit, to determine whether processor state is preserved so that a
return from the machine check interrupt handler can be safely
attempted.