You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Linux-Architecture-Reference/Error Handling/app_fault_v_errorlog.xml

892 lines
24 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2016 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<appendix xmlns="http://docbook.org/ns/docbook"
xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en" xml:id="dbdoclet.50569385_36278">
<title>
When to use: Fault vs. Error Log Indicators (Lightpath Mode)</title>
<para>This appendix gives
<emphasis>highly</emphasis> recommended Service Indicator activation models
for typical system issues, when the Lightpath mode is implemented. The
purpose of this appendix is to get consistency across platforms, and to
answer common questions about how to handle specific issues. The reason that
these are recommended rather than required, is due to the range of systems
that are involved, specifically related to the different types of physical
layouts (for example: deskside, blade and blade chassis, rack-mounted and
particularly high end racks).</para>
<para>This appendix does
<emphasis>not</emphasis> change the architectural requirements specified in
other parts of this document, nor the requirement for implementations to
support those requirements. If there are any inconsistencies between this
appendix and the requirements in the rest of this document, the requirements
take precedence over this appendix. It is very important, therefore, that
designers understand the requirements in this architecture, and more
specifically, those in
<xref linkend="dbdoclet.50569347_86824" />.</para>
<para><xref linkend="dbdoclet.50569385_72358" /> gives the recommended models. The
general model, though, is still dictated by the following requirement, copied
here from the <xref linkend="LoPAR.Platform" />:</para>
<informalfigure>
<mediaobject>
<imageobject role="html">
<imagedata fileref="figures/PAPR-73.gif" format="GIF" scalefit="1" />
</imageobject>
<imageobject role="fo">
<imagedata contentdepth="100%" fileref="figures/PAPR-73.gif"
format="GIF" scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</informalfigure>
<table frame="all" pgwide="1" xml:id="dbdoclet.50569385_72358">
<title>Service Indicator Activation Models for Typical System
Issues (Lightpath Mode)
<emphasis></emphasis></title>
<tgroup cols="7">
<colspec colname="c1" colwidth="15*" />
<colspec colname="c2" colwidth="25*" />
<colspec colname="c3" colwidth="14*" align="center" />
<colspec colname="c4" colwidth="10*" align="center" />
<colspec colname="c5" colwidth="8*" align="center" />
<colspec colname="c6" colwidth="8*" align="center" />
<colspec colname="c7" colwidth="20*" />
<thead>
<row>
<entry morerows="1" align="center" valign="middle">
<para>
<emphasis role="bold">Component or problem area</emphasis>
</para>
</entry>
<entry morerows="1" align="center" valign="middle">
<para>
<emphasis role="bold">Problem or issue</emphasis>
</para>
</entry>
<entry nameend="c4" namest="c3">
<para>
<emphasis role="bold">Indicator activation?
<emphasis><?linebreak?>(see notes
<xref linkend="dbdoclet.50569385_14395" xrefstyle="select: nopage" />,
<xref linkend="dbdoclet.50569385_43948" xrefstyle="select: nopage" />)</emphasis></emphasis>
</para>
</entry>
<entry morerows="1" valign="middle">
<para>
<emphasis role="bold">Entry made in Service Focal Point error
log?</emphasis>
</para>
</entry>
<entry morerows="1" valign="middle">
<para>
<emphasis role="bold">IBM Call Home?</emphasis>
</para>
</entry>
<entry morerows="1" align="center" valign="middle">
<para>
<emphasis role="bold">Comments</emphasis>
</para>
</entry>
</row>
<row>
<entry>
<para>FRU Fault indicator?<?linebreak?>(see notes
<xref linkend="dbdoclet.50569385_57037" xrefstyle="select: nopage" />,
<xref linkend="dbdoclet.50569385_69283" xrefstyle="select: nopage" />)</para>
</entry>
<entry>
<para>Error Log indicator?<?linebreak?>(see note
<xref linkend="dbdoclet.50569385_69283" xrefstyle="select: nopage" />)</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para>All</para>
</entry>
<entry>
<para>Any not already covered in this table</para>
</entry>
<entry nameend="c6" namest="c3">
<para>Consult with the xipSIA architecture team for the proper
behavior</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry morerows="3">
<para>Power supply or fan</para>
</entry>
<entry>
<para>Redundant optional one missing</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>This row for information only (this is not a Serviceable
Event).</para>
</entry>
</row>
<row>
<entry>
<para>Redundant non-optional one missing</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Failed redundant</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Failed non-redundant</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Power regulator not in power supply</para>
</entry>
<entry>
<para>Failed</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>Treated the same as any other FRU or component on a FRU which
fails.</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>Power</para>
</entry>
<entry>
<para>Insufficient power with all power supplies installed and
operational for power domain</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry morerows="1">
<para>For components that will not power up due to lack of power
(for example, a blade in a chassis), and when that component
implements a fast blink capability for the green power LED, that
component&#8217;s green power LED remains in the fast blink
mode.</para>
</entry>
</row>
<row>
<entry>
<para>Insufficient power with missing power supply</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>Disabled button pressed</para>
</entry>
<entry>
<para>Unable to power on or off component due to power button being
disabled</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry morerows="1">
<para>These rows for information only (these are not a Serviceable
Events).</para>
</entry>
</row>
<row>
<entry>
<para>KVM or media tray button pressed, but not active because
disabled</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>Temperature detected out of tolerance</para>
</entry>
<entry>
<para>Warning only (no performance throttling or shut-down)</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>brand dependent</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Performance throttling or shut-down due to over-temp
condition</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>FRU Fault indicator must be at component that is throttled or
shut-down (temperature indicators, which are not architected, do
not roll-up to Enclosure Fault indicators).</para>
</entry>
</row>
<row>
<entry morerows="2">
<para>Memory (DIMMs)</para>
</entry>
<entry>
<para>No memory installed at all, or memory that is installed is
mismatched</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Invalid memory configuration above the base memory (missing
memory, mismatched memory, unsupported memory)</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Failed or predicted to fail</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>System VPD</para>
</entry>
<entry>
<para>Invalid or missing</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>Lack of some of the required fields in the call-home may
result in the call-home being flagged as a lack of
entitlement.</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>Battery</para>
</entry>
<entry>
<para>Missing or failed</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>brand dependent</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Rechargeable battery needing reconditioning</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>CPU</para>
</entry>
<entry>
<para>Failed or predicted to fail</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Planar or CEC</para>
</entry>
<entry>
<para>Failed</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Disk</para>
</entry>
<entry>
<para>Failed or predicted to fail</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>Boot device</para>
</entry>
<entry>
<para>Missing boot device</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Corrupt image</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>I/O adapter</para>
</entry>
<entry>
<para>Fail or non-recoverable EEH error</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>Service Processor (FSP, BMC, IMM)</para>
</entry>
<entry>
<para>Hardware failure</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Firmware failure</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>brand dependent</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry morerows="1">
<para>BIOS or Flash</para>
</entry>
<entry>
<para>Corrupted single side</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Both sides corrupted</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>All</para>
</entry>
<entry>
<para>Unisolated event</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>See also Requirement
<xref linkend="dbdoclet.50569347_86824" /></para>
</entry>
</row>
<row>
<entry>
<para>Switch</para>
</entry>
<entry>
<para>Failed</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>System chassis controller (AMM, CME, ITME)</para>
</entry>
<entry>
<para>Failed</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Midplane</para>
</entry>
<entry>
<para>Failed</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>no</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
<row>
<entry>
<para>Cables</para>
</entry>
<entry>
<para>Failed or missing</para>
</entry>
<entry>
<para>N/A</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>yes</para>
</entry>
<entry>
<para>&#160;</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
<emphasis role="bold">Notes:</emphasis>
</para>
<orderedlist>
<listitem xml:id="dbdoclet.50569385_14395">
<para>Never activate both a
Fault indicator and an Error Log indicator for the same problem. See also
Requirement
<xref linkend="dbdoclet.50569347_86824" />, referenced immediately above
<xref linkend="dbdoclet.50569385_72358" />.</para>
</listitem>
<listitem xml:id="dbdoclet.50569385_43948">
<para>Fault indicators
above the FRU Fault indicator are not specified here, but the requirements
specify that a FRU Fault is rolled-up to the next higher level indicator
(specifically, the Enclosure Fault indicator).</para>
</listitem>
<listitem xml:id="dbdoclet.50569385_57037">
<para>Enclosure Fault
indicators and above are only roll-up indicators and are never activated
without a FRU Fault indicator being activated. Therefore the column in
<xref linkend="dbdoclet.50569385_72358" />indicates a FRU Fault indicator.
That is, if no FRU Fault indicator exists for the problem, then the Error Log
indicator is used instead (per Requirement
<xref linkend="dbdoclet.50569347_86824" />, referenced immediately above
<xref linkend="dbdoclet.50569385_72358" />).</para>
</listitem>
<listitem xml:id="dbdoclet.50569385_69283">
<para>The activation of the
Error Log indicator (previously known as the System Information (Attention)
indicator) and Fault indicators are regulated by the following requirements,
among others:</para>
<itemizedlist>
<listitem><para><xref linkend="dbdoclet.50569347_17215"/></para></listitem>
<listitem><para><xref linkend="dbdoclet.50569347_38463"/></para></listitem>
</itemizedlist>
</listitem>
</orderedlist>
</appendix>