RTAS Error and Event Classes

RTAS Error and Event Classes describes the predefined classes of error and event notifications that can be presented through the check-exception and event-scan RTAS functions. More detailed descriptions of these classes are given later in this chapter. defines nodes in the OF device tree which, through an “interrupts” property, may list the platform-dependent interrupts related to each class. From this information, OSs know which interrupts may be handled by calling check-exception. The OF structure for describing these interrupts is defined in . also defines the mask parameter for the check-exception and event-scan RTAS functions which limits the search for errors and events to the classes specified. Error and Event Classes with RTAS Function Call Mask Class Type OF Node Name(where the “interrupts” property lists the interrupts) RTAS Function Call Mask(value = 1 enables class) Internal Errors internal-errors bit 0 Environmental and Power Warnings epow-events bit 1 Reserved bit 2 Hot Plug Events hot-plug-events bit 3 I/O Events and Errors ibm,io-events bit 4

R1--1. For the Platform Interrupt Event option: The platform must implement the I/O Events and Errors class type along with the appropriate ibm,io-events node property to specify the interrupts. R1--2. Platform-specific error and event interrupts that a platform provider wants the OS to enable must be listed in the “interrupts” property of the appropriate OF event class node, as described in . R1--3. To enable platform-specific error and event interrupt notification, OSs must find the list of interrupts (described in ) for each error and event class in the OF device tree, and enable them. R1--4. OSs must have interrupt handlers for the enabled interrupts described in Requirement , which call the RTAS check-exception function to determine the cause of the interrupt. R1--5. Platforms which support error and event reporting must provide information to the OS via the RTAS event-scan and check-exception functions, using the reporting format described in . R1--6. Optional Extended Error Log information, if returned by the event-scan or check-exception functions, must be in the reporting format described in . R1--7. To provide control over performance, the RTAS event reporting functions must not perform any event data gathering for classes not selected in the event class mask parameter, nor any extended data gathering if the time critical parameter is non-zero or the log buffer length parameter does not allow for an extended error log. R1--8. To prevent the loss of any event notifications, the RTAS event reporting functions must be written to gather and process error and event data without destroying the state information of events other than the one being processed. R1--9. Any interrupts or interrupt controls used for error and event notification must not be shared between error and event classes, or with any other types of interrupt mechanisms. This allows the OS to partition its interrupt handling and prevents blocking of one class of interrupt by the processing of another. R1--10. If a platform chooses to report multiple event or error sources through a single interrupt, it must ensure that the interrupt remains asserted or is re-asserted until check-exception has been used to process all outstanding errors or events for that interrupt. Platform Implementation Note: In Requirement , although the fixed-part return format for check-exception and event-scan is the same, there are some expectations about what types of error response may be returned from these functions, as follows: The event-scan function is mainly intended to report only errors that have been recovered or are non-critical to the OS, since it is only called on a periodic basis. As such, it should never be used to report a Severity greater than “WARNING”. More critical errors should be signaled by an interrupt. Typically, the expected response of an OS to an event-scan error report is simply to log the error. The check-exception function may report error information of any severity. If event-scan is reporting a critical error (for example, a checkstop) that occurred before the current boot session, it should not report it with a “FATAL” Severity, even though the condition was fatal at the time the failure occurred. The Severity field informs the OS of the severity of the event at the time of reporting. Errors which occurred before a successful reboot are no longer critical. Likewise, the RTAS Disposition field for such an error should be “FULLY_RECOVERED”. There is a bit in the extended error log to indicate these “residual” errors. Although check-exception can potentially clean up an error and return a “FULLY_RECOVERED” disposition, recovery still may not occur if the MSR RI bit is not set to 1. It is up to the OS to examine the RI bit, to determine whether processor state is preserved so that a return from the machine check interrupt handler can be safely attempted.