You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
337 lines
14 KiB
XML
337 lines
14 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Copyright (c) 2016, 2020 OpenPOWER Foundation
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
|
|
-->
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
version="5.0"
|
|
xml:id="dbdoclet.50569337_17513">
|
|
|
|
<title>Environmental and Power Warnings</title>
|
|
|
|
<para>Environmental and Power Warnings (EPOW) is an option that provides
|
|
a means for the platform to inform the OS of these types of events. The
|
|
intent is to enable the OS to provide basic information to the user about
|
|
environmental and power problems and to minimize the logical damage done
|
|
by these problems. For example, an OS might want to abort all disk I/O
|
|
operations in progress to ensure that disk sectors are not corrupted by
|
|
the loss of power. Even on platforms that provide hardware protection of
|
|
data during environmental events, EPOW notification allows discrimination
|
|
between I/O errors caused by hardware failures versus EPOW events.</para>
|
|
|
|
<para>These warnings include action codes that the platform can use to
|
|
influence the OS behavior when various hardware components fail. For
|
|
example, a fan failure where the system can continue to operate in the
|
|
safe cooling range may just generate an action code of WARN_COOLING, but
|
|
a fan failure where the system cannot operate in the safe cooling range
|
|
may generate an action code of SYSTEM_HALT.</para>
|
|
|
|
<para><emphasis role="bold">Implementation Note:</emphasis> Hardware cannot assume that the OS will
|
|
process or take action on these warnings. These warnings are only
|
|
provided to the OS in order to allow the OS a chance to cleanly abort
|
|
operations in progress at the time of the warning. Hardware still assumes
|
|
responsibility for preventing hardware damage due to environmental or
|
|
power problems.</para>
|
|
|
|
<para>An OS that wants to be EPOW-aware will look for the
|
|
<emphasis>epow-events</emphasis> node in the OF device tree, enable the
|
|
interrupts listed in its
|
|
<literal>“interrupts”</literal> property, and provide an
|
|
interrupt handler to call
|
|
<emphasis>check-exception</emphasis> when one of those interrupts are
|
|
received.</para>
|
|
|
|
<para>When an EPOW event occurs, whether reported by
|
|
<emphasis>check-exception</emphasis> or
|
|
<emphasis>event-scan</emphasis>, RTAS will directly pass back the EPOW
|
|
sensor value as part of the Extended Error Log format as described in
|
|
<xref linkend="dbdoclet.50569337_39498"/>, assuming the extended log is
|
|
requested. Doing so avoids the need for the OS to make an extra RTAS call
|
|
to obtain the sensor value. For critical power problems, the
|
|
<emphasis>check-exception</emphasis> function is used to immediately
|
|
report changes of state to the OS, while the
|
|
<emphasis>get-sensor-state</emphasis> function allows the OS to monitor
|
|
the condition (for example, loss of AC power) to see if the problem
|
|
corrects itself.</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><emphasis role="bold">R1-<xref linkend="dbdoclet.50569337_17513"
|
|
xrefstyle="select: labelnumber nopage"/>-1.</emphasis></term>
|
|
<listitem>
|
|
<para> If the platform supports Environmental and
|
|
Power Warnings by including a EPOW device tree entry, then the platform
|
|
must support the EPOW sensor for the
|
|
<emphasis>get-sensor-state</emphasis> RTAS function.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><emphasis role="bold">R1-<xref linkend="dbdoclet.50569337_17513"
|
|
xrefstyle="select: labelnumber nopage"/>-2.</emphasis></term>
|
|
<listitem>
|
|
<para> The EPOW sensor, if provided, must contain
|
|
the EPOW action code (defined in
|
|
<xref linkend="dbdoclet.50569337_45557"/>) in the least significant 4
|
|
bits. In cases where multiple EPOW actions are required, the action code
|
|
with the highest numerical value (where 0 is lowest and 7 is highest)
|
|
must be presented to the OS. The platform may implement any subset of
|
|
these action codes, but must operate as described in
|
|
<xref linkend="dbdoclet.50569337_45557"/> for those it does implement.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><emphasis role="bold">R1-<xref linkend="dbdoclet.50569337_17513"
|
|
xrefstyle="select: labelnumber nopage"/>-3.</emphasis></term>
|
|
<listitem>
|
|
<para>To ensure adequate response time,
|
|
platforms which implement the EPOW_MAIN_ENCLOSURE or EPOW_POWER_OFF
|
|
action codes must do so via interrupt and
|
|
<emphasis>check-exception</emphasis> notification, rather than by
|
|
<emphasis>event-scan</emphasis> notification.
|
|
<emphasis>(Except as modified by Requirement
|
|
<xref linkend="dbdoclet.50569337_83439"/>)</emphasis></para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry xml:id="dbdoclet.50569337_83439">
|
|
<term><emphasis role="bold">R1-<xref linkend="dbdoclet.50569337_17513"
|
|
xrefstyle="select: labelnumber nopage"/>-4.</emphasis></term>
|
|
<listitem>
|
|
<para>If the platform
|
|
does not notify EPOW_MAIN_ENCLOSURE or EPOW_POWER_OFF via interrupt, then
|
|
the platform must protect data on I/O storage devices from corruption due
|
|
to the EPOW event.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><emphasis role="bold">R1-<xref linkend="dbdoclet.50569337_17513"
|
|
xrefstyle="select: labelnumber nopage"/>-5.</emphasis></term>
|
|
<listitem>
|
|
<para>For interrupt-driven EPOW events, the
|
|
platform must ensure that an EPOW interrupt is not lost in the case where
|
|
a numerically higher-priority EPOW event occurs between the time when
|
|
<emphasis>check-exception</emphasis> gathers the sensor value and when it
|
|
resets the interrupt.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><emphasis role="bold">R1-<xref linkend="dbdoclet.50569337_17513"
|
|
xrefstyle="select: labelnumber nopage"/>-6.</emphasis></term>
|
|
<listitem>
|
|
<para>For SYSTEM_SHUTDOWN EPOW class 3, after a
|
|
SYSTEM_SHUTDOWN EPOW commences and when the delay interval timer expires,
|
|
if an
|
|
<emphasis role="bold"><literal>“ibm,recoverable-epow3”</literal></emphasis>
|
|
encode-null
|
|
property in the
|
|
<emphasis role="bold"><literal>/rtas</literal></emphasis> node is present, then the OS code that manages
|
|
preserving storage must check the EPOW sensor state and the
|
|
<emphasis role="bold"><literal>“ibm,request-partition-shutdown”</literal></emphasis>
|
|
property if present. A normal boot must only occur when the EPOW sensor state
|
|
indicates that the EPOW condition requiring a shutdown no longer exists
|
|
(EPOW 0) and the
|
|
<emphasis role="bold"><literal>“ibm,request-partition-shutdown”</literal></emphasis>
|
|
is not present. Otherwise, the code that manages preserving storage must take
|
|
the action as identified by the property.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
<para><emphasis role="bold">Implementation Note:</emphasis> One way for hardware to prevent the loss of an
|
|
EPOW interrupt is by deferring the generation of a new EPOW interrupt
|
|
until the existing EPOW interrupt is reset by a call to the RTAS
|
|
<emphasis>check-exception</emphasis> function. Another way is to ignore
|
|
resets to the interrupt until all EPOW events have been reported.</para>
|
|
|
|
<table frame="all" pgwide="1" xml:id="dbdoclet.50569337_45557">
|
|
<title>EPOW Action Codes</title>
|
|
<tgroup cols="3">
|
|
<colspec colname="c1" colwidth="30*" align="center" />
|
|
<colspec colname="c2" colwidth="10*" align="center" />
|
|
<colspec colname="c3" colwidth="60*" />
|
|
<thead>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Action Code</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
<emphasis role="bold">Value</emphasis>
|
|
</para>
|
|
</entry>
|
|
<entry align="center">
|
|
<para>
|
|
<emphasis role="bold">Description</emphasis>
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody valign="middle" >
|
|
<row>
|
|
<entry>
|
|
<para>EPOW_RESET/MESSAGE</para>
|
|
</entry>
|
|
<entry>
|
|
<para>0</para>
|
|
</entry>
|
|
<entry>
|
|
<para>No EPOW event is pending. This action code is the lowest
|
|
priority.</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>WARN_COOLING</para>
|
|
</entry>
|
|
<entry>
|
|
<para>1</para>
|
|
</entry>
|
|
<entry>
|
|
<para>A non-critical cooling problem exists. An EPOW-aware OS
|
|
logs the EPOW information.</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>WARN_POWER</para>
|
|
</entry>
|
|
<entry>
|
|
<para>2</para>
|
|
</entry>
|
|
<entry>
|
|
<para>A non-critical power problem exists. An EPOW-aware OS
|
|
logs the EPOW information.</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>SYSTEM_SHUTDOWN</para>
|
|
</entry>
|
|
<entry>
|
|
<para>3</para>
|
|
</entry>
|
|
<entry>
|
|
<para>The system must be shut down. An EPOW-aware OS logs the
|
|
EPOW error log information, then schedules the system to be
|
|
shut down to begin after an OS defined delay internal (default
|
|
is 10 minutes.)</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>SYSTEM_HALT</para>
|
|
</entry>
|
|
<entry>
|
|
<para>4</para>
|
|
</entry>
|
|
<entry>
|
|
<para>The system must be shut down quickly. An EPOW-aware OS
|
|
logs the EPOW error log information, then schedules the system
|
|
to be shut down in 20 seconds.</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>EPOW_MAIN_ENCLOSURE</para>
|
|
</entry>
|
|
<entry>
|
|
<para>5</para>
|
|
</entry>
|
|
<entry>
|
|
<para>The system may lose power. The hardware ensures that at
|
|
least 4 milliseconds of power within operational thresholds is
|
|
available after signalling an interrupt. An EPOW-aware OS
|
|
performs any desired functions, masks the EPOW interrupt, and
|
|
monitors the sensor to see if the condition changes. Hardware
|
|
does not clear this action code until the system resumes
|
|
operation within safe power levels.</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>EPOW_POWER_OFF</para>
|
|
</entry>
|
|
<entry>
|
|
<para>7</para>
|
|
</entry>
|
|
<entry>
|
|
<para>The system will lose power. The hardware ensures that at
|
|
least 4 milliseconds of power within operational thresholds is
|
|
available after signalling an interrupt. An EPOW-aware OS
|
|
performs any desired operations, then attempts to turn system
|
|
power off. An EPOW-aware OS does not clear the EPOW interrupt
|
|
for this action code. This action code is the highest
|
|
priority.</para>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para><emphasis role="bold">Software Implementation Note:</emphasis> A recommended OS processing method
|
|
for an EPOW_MAIN_ENCLOSURE event is as follows:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
|
|
<para>Prepare for shutdown, mask the EPOW interrupt, and wait for 50
|
|
milliseconds. Then call get-sensor-state to read the EPOW sensor.</para>
|
|
|
|
</listitem>
|
|
<listitem>
|
|
|
|
<para>If the EPOW action code is unchanged, wait an additional 50
|
|
milliseconds.</para>
|
|
|
|
</listitem>
|
|
<listitem>
|
|
|
|
<para>If the action code is EPOW_POWER_OFF, attempt to power off.
|
|
Otherwise, the power condition may have stabilized, so interrupts may be
|
|
enabled and normal operation resumed.</para>
|
|
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
<emphasis role="bold">Implementation Note:</emphasis> EPOW_RESET (EPOW action code 0)
|
|
may be used to indicate that a previously reported EPOW condition is no
|
|
longer present. For instance, a system might see a WARN_POWER action code
|
|
for a loss of a redundant line input power. EPOW_RESET may subsequently
|
|
be issued if the line power were restored. The same bits in the EPOW
|
|
error log that specified the type of WARN_POWER EPOW generated would be
|
|
set in the EPOW_RESET error log to indicate the specific EPOW event that
|
|
was reset.</para>
|
|
|
|
<para>Systems that do not support an EPOW interrupt would generally be
|
|
unable to support the EPOW action codes 5 and 7. In those cases, there
|
|
could not be an EPOW event to indicate a loss of power. However, after
|
|
power were restored, generating the EPOW_RESET EPOW would indicate that
|
|
the system had lost power previously and the power had been restored. The
|
|
EPOW_RESET should only be used in this way if the system is unable to
|
|
generate an EPOW class 5 or class 7.</para>
|
|
|
|
</section>
|