You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
712 lines
25 KiB
XML
712 lines
25 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Copyright (c) 2016 OpenPOWER Foundation
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
|
|
-->
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
version="5.0"
|
|
xml:id="Internal_Error_Indications">
|
|
|
|
<title>Internal Error Indications</title>
|
|
|
|
<para>Hardware may detect a variety of problems during operation, ranging
|
|
from soft errors which have already been corrected by the time they are
|
|
reported, to hard errors of such severity that the OS (and perhaps the
|
|
hardware) cannot meaningfully continue operation. The mechanisms
|
|
described in
|
|
<xref linkend="dbdoclet.50569337_66258"/> are used to report such errors to the
|
|
OS. This section describes the architectural sources of errors, and
|
|
describes a method that platforms can use to report the error. All OSs
|
|
need to be prepared to encounter the errors reported as they are
|
|
described here. However, in some platforms more sophisticated handling
|
|
may be introduced via RTAS, and the OS may not have to handle the error
|
|
directly. More robust error detection, reporting, and correcting are at
|
|
the option of the hardware vendor.</para>
|
|
|
|
<para>
|
|
<anchor xml:id="dbdoclet.50569337_marker-13885" xreflabel="" />
|
|
<anchor xml:id="dbdoclet.50569337_marker-13886" xreflabel="" />The
|
|
primary architectural mechanism for indicating hardware errors to an OS
|
|
is the machine check interrupt. If an error condition is surfaced by
|
|
placing the system in checkstop, it precludes any immediate participation
|
|
by the OS in handling the error (that is, no error capture, logging,
|
|
recovery, analysis, or notification by the OS). For this reason, the
|
|
machine check interrupt is preferred over going to the checkstop state.
|
|
However, checkstop may be necessary in certain situations where further
|
|
processing represents an exposure to loss of data integrity. To better
|
|
handle such cases, a special hardware mechanism may be provided to gather
|
|
and store residual error data, to be analyzed when the system goes
|
|
through a subsequent successful reboot.</para>
|
|
|
|
<para>Less critical internal errors may also be signaled to the OS
|
|
through a platform-specific interrupt in the “Internal
|
|
Errors” class, or by periodic polling with the
|
|
<emphasis>event-scan</emphasis> RTAS function.</para>
|
|
|
|
<para><emphasis role="bold">Architecture Note:</emphasis> The machine check interrupt will not be listed
|
|
in the OF node for the “Internal Errors” class, since it is a
|
|
standard architectural mechanism. The machine check interrupt mechanism
|
|
is enabled from software or firmware by setting the MSRME bit =1. Upon
|
|
the occurrence of a machine check interrupt, bits in SRR1 will indicate
|
|
the source of the interrupt and SRR0 will contain the address of the next
|
|
instruction that would have been executed if the interrupt had not
|
|
occurred. Depending on where the error is detected, the machine check
|
|
interrupt may be surfaced from within the processor, via logical
|
|
connection to the processor machine check interrupt pin, or via a system
|
|
bus error indicator (for example, Transfer Error Acknowledge -
|
|
TEA).</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry xml:id="dbdoclet.50569337_38528">
|
|
<term><emphasis role="bold">R1-<xref linkend="Internal_Error_Indications"
|
|
xrefstyle="select: labelnumber nopage"/>-1.</emphasis></term>
|
|
<listitem>
|
|
<para>OSs must set
|
|
MSRME=1 prior to the occurrence of a machine check interrupt in order to
|
|
enable machine check processing via the
|
|
<emphasis>check-exception</emphasis> RTAS function.</para>
|
|
|
|
<para>
|
|
<emphasis role="bold">Architecture Note:</emphasis> Requirement
|
|
<xref linkend="dbdoclet.50569337_38528"/> is not applicable when the FWNMI
|
|
option is used.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry xml:id="dbdoclet.50569337_36603">
|
|
<term><emphasis role="bold">R1-<xref linkend="Internal_Error_Indications"
|
|
xrefstyle="select: labelnumber nopage"/>-2.</emphasis></term>
|
|
<listitem>
|
|
<para>For
|
|
hardware-detected errors, platforms must generate error indications as
|
|
described in
|
|
<xref linkend="dbdoclet.50569337_30773"/>, unless the error can be handled
|
|
through a less severe platform-specific interrupt, or the nature of the
|
|
error forces a checkstop condition.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><emphasis role="bold">R1-<xref linkend="Internal_Error_Indications"
|
|
xrefstyle="select: labelnumber nopage"/>-3.</emphasis></term>
|
|
<listitem>
|
|
<para>Platforms which detect and report the
|
|
errors described in
|
|
<xref linkend="dbdoclet.50569337_30773"/> must provide information to the OS
|
|
via the RTAS
|
|
<emphasis>check-exception</emphasis> function, using the reporting format
|
|
described in
|
|
<xref linkend="dbdoclet.50569337_21249"/>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry xml:id="dbdoclet.50569337_16228">
|
|
<term><emphasis role="bold">R1-<xref linkend="Internal_Error_Indications"
|
|
xrefstyle="select: labelnumber nopage"/>-4.</emphasis></term>
|
|
<listitem>
|
|
<para>To prevent error
|
|
propagation and allow for synchronization of error handling, all
|
|
processors in a multi-processor system must receive any machine check
|
|
interrupt signaled via the external machine check interrupt pin.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
<para>
|
|
<emphasis role="bold">Platform Implementation Notes:</emphasis>
|
|
</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
|
|
<para>The intent of Requirement
|
|
<xref linkend="dbdoclet.50569337_36603"/> is to define standard error
|
|
notification mechanisms for different hardware error types. For most
|
|
hardware errors, the signaling mechanism is the machine check interrupt,
|
|
although this requirement hints at the use of a less severe
|
|
platform-specific interrupt for some errors. The important point here is
|
|
actually whether the error can be handled through that interrupt. Simply
|
|
using an external interrupt to signal the error is not sufficient. The
|
|
hardware and RTAS also need to limit the propagation of corrupted data,
|
|
prevent loss of error state data, and support the cleanup and recovery of
|
|
such an error. Since the response to an external interrupt may be
|
|
significantly slower than a machine check, and in fact may be masked, the
|
|
error should not require immediate action on the part of the OS and/or
|
|
RTAS. In addition, external interrupts (except external machine check
|
|
interrupts) are reported to only one processor, so operations by the
|
|
other processors in an MP system should not be impacted by this error
|
|
unless they specifically try to access the failing hardware element. To
|
|
summarize, platforms should not use platform-specific interrupts to
|
|
signal hardware errors unless there is a complete hardware/RTAS platform
|
|
solution for handling such errors.</para>
|
|
|
|
</listitem>
|
|
<listitem>
|
|
|
|
<para>The intent of Requirement
|
|
<xref linkend="dbdoclet.50569337_16228"/> is that most hardware errors would be
|
|
signaled simultaneously to all processors, so that processors could
|
|
synchronize the error handling process; that is, one processor would be
|
|
chosen to do the call to
|
|
<emphasis>check-exception</emphasis>, while the other processors remained
|
|
idle so that they would not interfere with RTAS while it gathered machine
|
|
check error data. While this is a straightforward wiring solution for
|
|
errors signaled via the external machine check interrupt pin, that is not
|
|
the case for internal processor errors or processor bus errors.
|
|
Typically, only one processor will see such errors. An internal processor
|
|
error can be identified with just the contents of SRR1, and so can be
|
|
handled without synchronization with other processors. Processor bus
|
|
errors, however, can be more difficult, especially if the error is
|
|
propagated up to the processor bus from a lower-level bus. In general,
|
|
such propagation should be avoided, and such errors should be signaled
|
|
through the machine check interrupt pin to ensure proper error
|
|
handling.</para>
|
|
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<section xml:id="dbdoclet.50569337_52952">
|
|
<title>Error Indication Mechanisms</title>
|
|
|
|
<para>
|
|
<xref linkend="dbdoclet.50569337_30773"/> describes the mechanisms by
|
|
which software will be notified of the occurrence of operational failures
|
|
during the types of data transfer operations listed below. The assumption
|
|
here is that the error notification can occur only if a hardware
|
|
mechanism for error detection (for example, a parity checker) is present.
|
|
In cases where there is no specific error detection mechanism, the
|
|
resulting condition, and whether the software will eventually recognize
|
|
that condition as a failure, is undefined.</para>
|
|
<para> </para>
|
|
<table frame="all" pgwide="1" xml:id="dbdoclet.50569337_30773">
|
|
<title>Error Indications for System Operations</title>
|
|
<tgroup cols="6">
|
|
<colspec colname="c1" colwidth="10*" />
|
|
<colspec colname="c2" colwidth="10*" />
|
|
<colspec colname="c3" colwidth="20*" />
|
|
<colspec colname="c4" colwidth="20*" />
|
|
<colspec colname="c5" colwidth="20*" />
|
|
<colspec colname="c6" colwidth="20*" />
|
|
<thead>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Initiator</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Target</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Operation</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Error Type(if detected)</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Indication to Software</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Comments</emphasis>
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<para>Processor</para>
|
|
</entry>
|
|
<entry>
|
|
<para>N/A</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Internal</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Some may cause checkstop</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry morerows="11">
|
|
<para>Processor</para>
|
|
</entry>
|
|
<entry morerows="11">
|
|
<para>Memory</para>
|
|
</entry>
|
|
<entry morerows="4">
|
|
<para>Load</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Invalid address</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>System bus time-out</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Address parity on system bus</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Data parity on system bus</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Memory parity or uncorrectable ECC</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry morerows="3">
|
|
<para>Store</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Invalid address</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>System bus time-out</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Address parity on system bus</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Data parity on system bus</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>External cache load</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Memory parity or uncorrectable ECC</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Associated with Instruction Fetch or Data Load</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>External cache flush</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Cache parity or</para>
|
|
<para>uncorrectable ECC</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>External cache access</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Cache parity or</para>
|
|
<para>uncorrectable ECC</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Associated with Instruction Fetch or Data Transfer</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Processor</para>
|
|
</entry>
|
|
<entry>
|
|
<para>I/O</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Load or Store</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various errors between the processor and the I/O
|
|
fabric</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>I/O fabrics include hubs and bridges and interconnecting
|
|
buses or links.</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry morerows="3">
|
|
<para>Processor</para>
|
|
</entry>
|
|
<entry morerows="3">
|
|
<para>I/O bus configuration space</para>
|
|
</entry>
|
|
<entry morerows="1">
|
|
<para>Read</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various, except no response from IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Firmware receives a machine check, OS receives
|
|
all=1’s data along with a
|
|
<emphasis>Status</emphasis> of -1 from the RTAS call</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, firmware does not get
|
|
a machine check and the PE is in the EEH Stopped State on
|
|
return from the RTAS call</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>No response from an IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>All-1’s data returned, along with a
|
|
“Success”
|
|
<emphasis>Status</emphasis> from the RTAS call</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, the PE is not in the
|
|
EEH Stopped State on return from the RTAS call</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry morerows="1">
|
|
<para>Write</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various, except no response from IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Firmware receives a machine check, OS receives a
|
|
<emphasis>Status</emphasis> of -1 from the RTAS call</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, firmware does not get
|
|
a machine check and the PE is in the EEH Stopped State on
|
|
return from the RTAS call</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>No response from an IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Operation ignored, OS receives a “Success”
|
|
<emphasis>Status</emphasis> from the RTAS call</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, the PE is in the EEH
|
|
Stopped State on return from the RTAS call</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry morerows="3">
|
|
<para>Processor</para>
|
|
</entry>
|
|
<entry morerows="3">
|
|
<para>I/O bus;</para>
|
|
<para>I/O Space</para>
|
|
<para>or</para>
|
|
<para>Memory Space</para>
|
|
</entry>
|
|
<entry morerows="1">
|
|
<para>Load</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various, except no response from IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, no machine check is
|
|
received, all-1’s data is returned, and the PE enters the
|
|
EEH Stopped State</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>No response from an IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>All-1’s data returned</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Invalid address, broken IOA, or configuration cycle to
|
|
non-existent IOA; if EEH is implemented and enabled, the PE
|
|
enters the EEH Stopped State</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry morerows="1">
|
|
<para>Store</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various, except no response from IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, no machine check is
|
|
received and the PE enters the EEH Stopped State</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>No response from IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Ignore Store</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Invalid address, broken IOA, or configuration cycle to
|
|
non-existent IOA; if EEH is implemented and enabled, the PE
|
|
enters the EEH Stopped State</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>Processor</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Invalid address (addressing an “undefined”
|
|
address area)</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Load or Store</para>
|
|
</entry>
|
|
<entry>
|
|
<para>No response from system</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>I/O</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Memory</para>
|
|
</entry>
|
|
<entry>
|
|
<para>DMA - either direction</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various, including but not limited to:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Invalid addr accepted by a bridge</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>TCE extent</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>TCE Page Mapping and Control mis-match or invalid
|
|
TCE</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check unless reportable directly to the IOA in a
|
|
way that allows the IOA to signal the error to its device
|
|
driver</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, no machine check is
|
|
received and the PE enters the EEH Stopped State</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>I/O</para>
|
|
</entry>
|
|
<entry>
|
|
<para>I/O</para>
|
|
</entry>
|
|
<entry>
|
|
<para>DMA - either direction</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Various</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Machine check unless reportable to master of the transfer
|
|
in a way that allows master to recover</para>
|
|
</entry>
|
|
<entry>
|
|
<para> </para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>I/O</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Invalid address</para>
|
|
</entry>
|
|
<entry>
|
|
<para>DMA - either direction</para>
|
|
</entry>
|
|
<entry>
|
|
<para>No response from any IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>PCI IOA</para>
|
|
<para>master-aborts</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Signal device driver using an external interrupt</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>PCI IOA</para>
|
|
</entry>
|
|
<entry>
|
|
<para>-</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Any</para>
|
|
</entry>
|
|
<entry>
|
|
<para>Internal, indicated by SERR# or ERR_FATAL</para>
|
|
</entry>
|
|
<entry>
|
|
<para>SERR# or ERR_FATAL, causing machine check</para>
|
|
</entry>
|
|
<entry>
|
|
<para>If EEH is implemented and enabled, no machine check is
|
|
received and the PE enters the EEH Stopped State</para>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para><emphasis role="bold">Implementation Note:</emphasis> IOAs should, whenever possible, detect the
|
|
occurrence of PCI errors on DMA and report them via an external interrupt
|
|
(for possible device driver recovery) or retry the operation. Since
|
|
system state has not been lost, reporting these errors via a machine
|
|
check to the CPUs is inappropriate. Some devices or device drivers may
|
|
cause a catastrophic error. Systems which wish to recover from these
|
|
types of errors should choose devices and device drivers which are
|
|
designed to handle them correctly.</para>
|
|
|
|
</section>
|
|
|
|
</section>
|