Service Indicators This chapter defines service indicators Note that many times “indicators” are referred to as “LEDs” as this is one of the most common implementations for indicators at the current time. relative to: Which service indicators may be exposed to an OS and which may not The usage model for service indicators, regardless of whether they are exposed to the OS or not
General This section gives some general background information required to understand the service indicator requirements. The service indicator requirements can be found starting in .
Basic Platform Definitions The following are the definitions of some of the terms used in this architecture. See also for additional terms.
“Enclosure”, Packaging, and Other Terminology In order to abstract specific packaging differences between different products, this architecture uses a number of terms that denote a unit of packaging. The term enclosure means something different, depending on the product line. Generally this is an entity that can be unplugged and be removed from the system, but may include the entire system, and generally encloses other FRUs. It is, however, possible to have a FRU that contains one other FRU, and not have it be an enclosure. See below, for more information. The concept of the enclosure is very key to this architecture, because the enclosure provides the anchor point for the Enclosure Fault, Enclosure Identify, and (when applicable) the Error Log indicators. For a blade system, a base blade plus any attached sidecars. A sidecar is a blade that plugs into a blade slot, but which is physically connected to the base blade and which cannot be removed without also removing the base blade and any other attached sidecars. The Enclosure Identify indicator is located on the base blade. Sidecars do not have an Enclosure Identify indicator. A stand-alone computing box, like a deskside unit. A separately powered box that attaches to a stand-alone computing box (for example, an I/O expansion tower). For a rack system, a drawer or partial drawer, with its own power domains, within a rack system (but not a chassis, in blade system terms). FRUs that have one or no internal FRUs are a possible exception to the above definition of enclosure. The general requirements are that the enclosing FRU does not need an Error Log indicator (see also Requirement ), implements the full xipSIA Lightpath architecture including FRU Identify, has the enclosing FRU Fault/Identify and internal (if any) FRU Fault/Identify indicators visible from the outside of the system the same way that enclosure indicators would be, and rolls-up the FRU Fault/Identify indicators to the next level of indicators when there is a next level (for example, chassis level indicators). Examples of the type of FRUs that the xipSIA architecture team might approve as a non-enclosures is: Appliance drawers (“appliance” means that there are no field serviceable parts inside). Appliance Blades, except if they require an Error Log indicator. A power supply which comprises two or fewer FRUs. Fans, but not fan assemblies when the fan assemblies have three or more Fault indicators. In addition, the term System Enclosure (also known as a Primary Enclosure) is used to denote the enclosure of a system that contains the one and only Error Log indicator Previously known as the System Information (Attention) indicator. for the system. An enclosure that is not a System Enclosure is called a secondary enclosure. The System Enclosure is expected to be one that contains at least some of the system processors for the platform. In this chapter, the term chassis will refer to a blade system chassis. Other terminology used in this chapter includes: activate To activate an indicator (physical or virtual) means to set it to a non-off state (blink, blip, or on). An indicator does not need to be in the off (deactivated) state prior to being activated (for example a second request to activate an already active indicator, is also considered to be an activation of that indicator). active state An indicator in an active state is in a non-off state. Different indicator types can be set to a different set of active states. For each of the following indicators, the following states are applicable in the indicator active state (See for more detail on when each state is applicable and for the conditions under which a state transition is made): FRU Identify: blink FRU Fault: on Blue Enclosure Identify: on or blink Enclosure Fault: on or blip Error Log: on Blue Rack Identify: on Blue Row Identify: on blip A blink state with a short duty cycle used in the “remind” state for Enclosure Fault Indicators. See also Requirement . Chassis Enclosure Identify An Enclosure Identify indicator at the blade system chassis level. CRU See FRU. deactivate To deactivate an indicator (physical or virtual) means to set it to the off state. Deactivating a virtual indicator may or may not deactivate the physical indicator associated with that virtual indicator (see ). Enclosure Fault An amber indicator which indicates, when activated, that there is a FRU Fault indicator in the enclosure that is active. Enclosure Identify An indicator that is used to identify an enclosure in an installation or an enclosure in a group. This indicator is blue in color and is turned on in the active identify state. FRU Field Replaceable Unit. Used to also mean CRU (Customer Replaceable Unit) in this chapter. FRU Fault An amber indicator that is used to point to a failing FRU in an enclosure. FRU Identify An amber indicator that is used to identify a FRU in an enclosure or a place where a FRU is to be plugged (for example, for an upgrade operation). Guiding Light Mode A platform implementation that provides FRU Identify indicators for identifying failing FRUs. See for more information. ID Shorthand used some places (mainly figures) in this chapter for “Identify” or “Identify indicator”. Lightpath Mode A platform implementation that provides FRU Fault indicators as the general way to identify failing FRUs. See for more information. not visible to the OS See transparent to the OS. primary level indicators The enclosure level indicators. For example, for rack systems, the enclosure is either the blade, for blade systems, or the drawer level, for non-blade systems. roll-down This term is not used by this architecture, but some people refer to roll-up as the action of activating a higher level indicator and roll-down as the action of deactivating a lower level indicator. This architecture will use roll-up for both activation and deactivation. See roll-up. roll-up The action of activating a higher level indicator, when a lower level indicator is activated, and deactivating it when all the lower level indicators that roll-up to that indicator are deactivated. For example, if a FRU Fault indicator is activated, it rolls-up and turns on the Enclosure Fault indicator, and when the last FRU Fault indicator in an enclosure is deactivated, the Enclosure Fault indicator for that enclosure is deactivated. secondary level indicators The indicators on levels below the Primary Level and above the FRU level. SFP Service Focal Point. See also . Error Log An amber indicator that indicates that the user needs to look at the error log or problem determination procedures, in order to determine the cause. tertiary level indicators The FRU level indicators. transparent to the OS Indicators whose state cannot be modified or sensed by OS or application level software. For example, power supply Fault indicators. turn off To turn off a physical indicator means exactly like it sounds. Turning off a virtual or logical indicator may or may not turn off the physical indicator, depending on the state diagram for the physical one (see ). visible to the OS Indicators whose state can be modified or sensed by OS or application level software. For PCI Hot Plug indicators. See also .
Service Indicator Visibility and Transparency to the OS An indicator is said to be transparent or not visible to the OS when its state cannot be modified or sensed by OS or application level software (for example, power supply Fault indicators). An indicator is said to be visible to the OS when its state can be modified and sensed by OS or application level software (for example, PCI Hot Plug indicators). Requirements on visibility can be found in .
Service Indicator A service indicator is defined as any indicator that is used in the course of servicing a system. The intent of service indicators is not, in general, to increase system Error Detection and Fault Isolation (EDFI), but rather to guide the user in performance of a service action. Usages include (but are not limited to): Dynamic Reconfiguration (LoPAR indicator type 9002) to indicate the status of DR operations on a Field Replacable Unit (FRU). More information on DR indicators can be found in . This indicator is amber The term “amber” will be used in this chapter to mean any wavelength between yellow and amber. in color except for some legacy implementations which combined this indicator with the power indicator, where the color was green. An indication of a fault condition of a FRU (LoPAR indicator type 9006, when OS visible). This indicator is amber in color. The FRU Fault indicator is handled differently by the platform based on whether or not the platform is Lightpath Mode platform or Guiding Light Mode: For Guiding Light Mode platforms, FRU fault indicators are transparent to the software and therefore have some very specific requirements relative to their very localized behavior. In this case, although FRU Fault indicators themselves are transparent to the software, the associated failure itself, which would activate a FRU Fault indicator, will be available to the software that handles serviceable events. For Lightpath Mode platforms, a FRU fault indicator will be available to the software and is activated by the detector of the error. In addition, the FRU Fault rolls up to an Enclosure Fault indicator. A system-wide indication of a fault or some condition needing attention in the system. An Error Log indicator (LoPAR indicator type 9006) is an example of an OS-visible For a definition of the visibility or transparency of an indicator, see . indicator of this class of indicators. The Error Log indicator is a flag to the user that there is something in the system needing attention, and therefore a starting point to indicate that they should begin the isolation procedures to determine what needs attention. In a partitioned system, the physical Error Log If the term “virtual” does not appear before “Error Log”, then the text is referring to the physical Error Log indicator. indicator may be the logical OR of individual virtual Error Log indicators (one virtual Error Log indicator per logical partition and one for each other separate entity that is non-partition related). This indicator is amber in color. An indication of an Identify (locate) operation. An Identify indicator (LoPAR indicator type 9007) is an example of an indicator of this class. These indicators may or may not be visible to an OS. In this capacity, the indicator is activated For a definition of what “activate” means, see . at the user’s request in order to help them locate a component in the system (for example, a FRU, a connector, an enclosure, etc.). This indicator is amber in color, except for the Enclosure, Rack, and Row Identify indicators, which are blue in color. An indication of the power state of an entity. This indicator is platform controlled and is transparent to the OS(s). In addition to the power state, this indicator may be used to indicate a power failure or fault. This indicator is green in color. Environmental indicators such as ambient temperature too high. These are transparent to the OS. Hardware only indicators such as Ethernet activity indicators. These are transparent to the OS.
Service Indicator Modes There are two modes that a platform can operate in relative to service indicators: Lightpath Mode and Guiding Light Mode. Any particular platform operates in one and only one mode relative to service indicators: Lightpath Mode or Guiding Light Mode. A component (hardware, firmware, or software) which is designed to be used in both a Lightpath Mode and a Guiding Light Mode platform needs to be able to operate in both modes. For guidance in which mode a platform should be designed to operate, see . The following sections give an overview of these two modes. For specific requirements of each mode, see .
Lightpath Mode The Lightpath Mode specifies a platform implementation of service indicators much like what industry-standard servers originally implemented with its Lightpath, except that FRU indicators also implement an Identify state along with the current Fault state. The Identify state overrides the Fault state while the Identify is active for an indicator, and the indicator is put into the Fault state, if one is pending, when the Identify is removed. A summary of the Lightpath Mode is as follows (see for detailed requirements): FRU Fault indicators are used as the general way to identify failing FRUs. The physical indicator that implements the Fault indicator states also implements the Identify indicator states (that is, a FRU Fault indicator is also a FRU Identify indicator for the same FRU). This mode is basically a superset of the Guiding Light Mode. FRU Fault indicators are presented to the OS for FRUs for which the OS image is expected to detect errors for either the entire FRU or part of the FRU. In the latter case, this represents a shared FRU, in which case the FRU Fault indicator is virtualized, so that one partition cannot view the setting by another partition, which would allow a covert storage channel (see also ). The OS and firmware are responsible for activating the FRU Fault indicator for a FRU for which they detect an error. Fault indicators are reset by the service action on the failing part that they represent. FRU Identify indicators are presented to the OS for FRUs that are fully owned by the OS image. They may also be presented for FRUs that are partially owned by the OS image. Ownership of a FRU by the OS image is defined as being the condition of the FRU being under software control by the OS, a device driver associated with the OS, or application software running on the OS. In the partially owned case, this represents a shared FRU, in which case the FRU Identify indicator is virtualized, so that one partition cannot view the setting by another partition, which would allow a covert storage channel (see also ). Connector Identify indicators are presented to the OS for connectors that are fully owned by the OS image. Ownership of a connector by the OS image is defined as being the condition of the connector being under software control by the OS, a device driver associated with the OS, or application software running on the OS. Enclosure Identify indicators are available to the OS when the OS fully owns a FRU in the enclosure. This indicator is virtualized in a partitioned system, so that one partition cannot view the setting by another partition, which would allow a covert storage channel. The Error Log indicator or virtual copy thereof (for LPARed platforms) is available to each OS image. See Requirement for more information for requirements on activation of the Fault indicators. The Triple-S UI, when implemented, also adds additional requires to Lightpath Mode implementations. See .
Guiding Light Mode A summary of the Guiding Light Mode is as follows (see for detailed requirements): FRU Identify indicators are used as the way of identifying service procedures like repair, reconfiguration, and upgrade. FRU Identify indicators are activated/deactivated by user via a user interface, to identify the FRU(s) involved in the service procedure. FRU Identify indicators are presented to the OS for FRUs that are fully owned by the OS image. Ownership of a FRU by the OS image is defined as being the condition of the FRU being under software control by the OS, a device driver associated with the OS, or application software running on the OS. Connector Identify indicators are presented to the OS for connectors that are fully owned by the OS image. Ownership of a connector by the OS image is defined as being the condition of the connector being under software control by the OS, a device driver associated with the OS, or application software running on the OS. Fault indicators are allowed, but not required, but if provided, must be transparent to any OS image, and are reset by the service action on the failing part that they represent. To be transparent to an OS means that they cannot be controlled by the OS, nor will they interfere with any other indicator that is controlled by the OS. Enclosure Identify indicators are provided as part of the Identify roll-up. Enclosure Identify indicators are available to the OS when the OS fully owns a FRU in the enclosure. This indicator is virtualized in a partitioned system, so that one partition cannot view the setting by another partition, which would allow a covert storage channel. The Error Log indicator, or a virtual copy thereof (for LPARed platforms), is available to the OS.
Covert Storage Channels A covert storage channel is a path between two entities that can be used to pass data outside the normal data sharing paths like LANs. For example, if two OS images were given access to the same physical indicator and if each OS image could read the state of the indicator, then the indicator can become a single-bit covert storage channel between cooperating entities in the two OS images, to pass data back and forth. This cannot be allowed, for security reasons, and therefore this architecture defines the concept of virtual indicators. A virtual indicator is provided to each OS image for each physical indicator that is shared between OS images. The physical indicator is activated when any virtual indicator for that physical indicator is activated and the physical indicator is deactivated when all virtual indicators for that physical indicator are deactivated. The general OS image can sense what it is trying to set the indicator to, but cannot sense what the other virtual indicators are set to, and hence no covert storage channel exists. The exception to the shared access is by a trusted Service Focal Point (see for more details). For more information on how virtual indicators affect the physical indicator state, see the physical indicator state diagrams later in this chapter. An OS image in a partitioned system needs to realize that it may not have full control over all physical indicators to which it has access (that is, needs to realize that the indicators may be virtualized in some cases), and in those cases it should not attempt to indicate to the user via a user interface the state of the physical indicator which is controlled by a virtual indicator. Virtual indicators are controlled by the OS for which they are generated, except that the platform may activate an OS’ virtual Error Log indicator if the partition in which the OS resides abnormally terminates.
Service Focal Point (SFP) and Service Partition The Service Focal Point (SFP), when it exists, will ultimately be the exclusive common point of control in the system for handling all service actions which are not resolved otherwise (for example, via Fault indicators). It interfaces with the error log where all the serviceable events are stored from the various OS and service processor diagnostics. The SFP, among other things, allows resetting of the Error Log light by the user, allows controlling the activation and deactivation of the FRU, connector, and Enclosure Identify indicators, and allows the clearing of the service actions in the error log. The SFP shares access to some of the same indicators as one or more OS images, but needs access to the physical indicator state, and sometimes the state of all the virtual indicators for that physical indicator. If the SFP in a partitioned system were to be implemented on an OS image that runs non-trusted applications, then the SFP partition could not be given access to the physical and other OS’ virtual service indicators, or covert storage channels would exist (see ) between the SFP partition and the other OS partitions. This architecture assumes that the SFP is implemented as trusted or privileged entity which does not allow non-trusted applications running on the same OS image as the SFP, and therefore covert storage channels are not considered to exist between the SFP’s privileged OS image and other OS images in the system. The SFP may also be implemented on as a separate entity from the one being monitored. A system management entity like an HMC interfacing to the platform via firmware interfaces, or an external system management entity, are examples of such implementations. For Lightpath Mode, the Triple-S UI is a user interface that is associated with a SFP. See . The platform’s physical indicators are accessible to the SFP through the normal indicator interface (LoPAR indicator types 9006 and 9007).
Logical Indicators vs. Physical Indicators A physical indicator is, in many cases, used to represent several logical and/or virtual indicators. For example, a physical FRU indicator can be used in Lightpath Mode to represent both a FRU Identify indicator and a FRU Fault indicator. The hardware/firmware that implements the physical indicator’s state machine is the entity which knows about the combining of the logical and virtual into the physical, and higher level software (OS and applications) that are given control of a logical or virtual indicator are only aware of the control of that logical or virtual indicator, and may not be even able to sense the state of the physical indicator (that is, can only sense the state of their logical or virtual ones). The physical indicator state diagrams in indicate how logical and virtual indicators are merged into the physical ones. See also relative to virtual indicators.
Machine Classes and Service Strategy Two broad classifications of computer implementations are defined here for purposes of defining service indicator implementations. shows the comparison of these classes.   Machine Classifications and Service Characteristics Characteristic Simple Class Complex Class Number of FRUs Few Many Servicing performed by Customer, generally CE more than customer Deferred maintenance Very little Enabled as much as possible Deferred maintenance is one of the big drivers towards use of Guiding Light mode or Lightpath Mode with Triple-S. That is, having many Fault indicators active at one time (FRUS waiting for service actions) can lead to confusion when service is being performed. Concurrent maintenance Generally limited to redundant components (fans, power supplies) and I/O devices Generally a higher level of concurrent maintenance Value of a FRU Fault indicator High Questionable value due to complexity of the system FRU Fault indicator implementation Realistic Complex, given the higher level of deferred and concurrent maintenance Console interface Rare Standard Platform service mode Lightpath Mode platform Guiding Light Mode platform or Lightpath Mode with Triple-S UI
Determining whether a platform’s classification, and therefore the service mode of the platform is dependent on the product requirements, and is beyond the scope of this architecture, but might include: The RAS requirements for the platform. The considerations for this come from . The mixture of machines expected in the environment. Although the Lightpath Mode and the Guiding Light Mode both contain the identify capability, and that could be considered to be the common denominator in servicing in a mixed environment, it could be that there are more Lightpath Mode platforms in the environment for which a new platform design is targeted, and therefore it might be desirable to make that new platform’s mode of operation be the Lightpath Mode for that reason.
General Information about Service Indicators Indicators may serve multiple uses, but only as defined by this architecture. For example, a physical indicator used for a FRU is used for both the FRU Fault and FRU Identify indicators. Non-architected usages of an architected indicator are specifically disallowed by this architecture. In some cases, an indicator may not be visible directly by the user without removing covers, components, etc. In this case, there is required to be one or more indicators that are higher in the hierarchy which get activated in conjunction with the target indicator. This functionality is called indicator roll-up. Due to the hierarchy, there might be multiple indicators that get rolled-up into a single indicator. The platform (not the OS) is responsible for indicator roll-up. An example of a roll-up is that on the front of an enclosure Note that the enclosure is sometimes called the “unit,” but a unit is not necessarily a drawer and a drawer is not necessarily a unit, so the term “unit” is not be used here. Also note that an enclosure might be a drawer in a rack or might be part of a drawer. For example, some I/O drawers consist of two separate and independent enclosures. So, sometimes there may be multiple enclosure indicators per rack drawer. See also . in a rack there is a summary LED that summarizes the Identify LEDs within or on the back of the enclosure in a rack, and then the multiple enclosure summary LEDs are summarized at the rack level with a light on the top of the rack. Another example of a roll-up indicator is the Enclosure Fault indicator on each enclosure in Lightpath Mode platforms, which summarizes the Fault indicators within the enclosure. These roll-up indicators are transparent to the OS, and sometimes to the firmware, with the exception that the enclosure level Identify indicators (or virtual versions thereof, in the case of partitioned systems) may be accessible to the OS via the 9007 indicator type. The indicators that are provided for roll-up from FRU to enclosure to rack are identified by this architecture. Platforms may have unique indicators which are not visible to the OS and which are not defined by this architecture. These will not share the same indicator as used by one of the indicators which is architected, including indicators in the roll-up path, except as explicitly allowed by the architectural requirements in this architecture. In addition to the roll-up to a higher level indicator for visibility, the platform may also provide duplicate indicators for some of the indicators. For example, there may be a front and rear indicator for the enclosure indicators. These duplicate indicators are not defined by this architecture except that as for roll-up indicators, the platform is responsible for controlling any duplicate indicators and for not making the duplicates visible to the controlling entities. Finally, FRU indicators are required to be visible to the user during a service action. This may require, for example, that the indicator be able to be lit after power is removed from the system, requiring storage of power on the component with the indicator (for example, via a capacitor) and activation of the indicators by a push button by the user. An OS image is given access to the FRU Identify indicators when the OS image fully owns the resources, and is given access to the Enclosure Identify indicator for any enclosure in which the OS image fully owns any resource. An OS image is given access to the FRU Fault indicators when the OS image owns all or part of the FRU. The Enclosure Fault indicators are roll-up only indicators and access to these indicators are not given to the OS. In a partitioned system (logical or physical), there may be several virtual Enclosure Identify indicators and one physical Enclosure Identify indicator. In this case, the OS images are only given access to their copy of the virtual Enclosure Identify indicator, and do not have direct access to the physical Enclosure Identify indicator. Activating any virtual Enclosure Identify indicator which is associated with an enclosure activates the physical one for that drawer (if not already activated). Turning off the last virtual Enclosure Identify indicator for an enclosure turns off the physical one for that enclosure, providing all other Identify indicators in the enclosure are also off. If software in a partition senses the state of the virtual Enclosure Identify indicator, it needs to take into consideration that it may be seeing the virtual state and not the real state of the indicator, with the virtual state being what the partition set the indicator to, and this is not necessarily what the physical indicator is actually displaying. The Error Log indicator is located on the System Enclosure (the CEC enclosure) and is used to indicate that there was a failure in the system. This indicator may also be used by the system to indicate that some other attention is needed in the system. This Error Log indicator is the starting point for the determination of the necessary action by the user. In a partitioned system (logical or physical), there may be several virtual Error Log indicators and one physical Error Log indicator. Activating a virtual Error Log indicator activates the physical one. Turning off the last virtual Error Log indicator turns off the physical one. If software in a partition senses the state of the Error Log indicator, it needs to take into consideration that it may be seeing the virtual state and not the real state of the indicator, with the virtual state being what the partition set the indicator to, and this is not necessarily what the physical indicator is actually displaying. For Guiding Light Mode platforms, the FRU Identify indicators are the primary method for pointing to failing FRUs. For Lightpath Mode platforms, it is expected that the FRU Identify indicators will be used as a secondary assistance for FRU fault identification (the FRU Fault indicators being the primary). In both cases, the FRU Identify indicators can be used to assist with such things as identifying where an upgrade should be inserted. The general rules for activation and deactivation of indicators can be found in Requirements and , and more explicit requirements of individual indicators in the state diagrams in . When the Triple-S UI is implemented, see also . This architecture assumes that the control of multiple users doing identify operations at the same time, is under procedural control, and is not handled or controlled in any way by this architecture, OS, or firmware. For Guiding Light Mode platforms, if a FRU contains a Fault indicator, then the Fault indicator is transparent to the OS and control of the FRU-level Fault indicator is entirely up to the FRU or to some OS-transparent method. For example, some power supplies contain a Fault indicator that does not get reported directly to the system controlling entity and which is turned off by a button on the power supply which is pushed when the service is complete.
Secondary Light Panels A secondary light panel may be used to house roll-up indicators as indicated in the “intermediate” level or “secondary” level indicators in and . These panels may also house other indicators which would otherwise not have a home (for example, an over-temperature indicator). Secondary light panels indicators are not to be used as replacement for FRU-level indicators. However, if an indicator is not directly visible when the unit is placed into the service position (for example, blocked by covers, baffles, cables, etc.), then the secondary light panel is one implementation to get around this restriction (other implementations may exist, for example light pipes, etc.).
Group Identify Operation In some systems it may be desirable to identify a set of enclosures as being part of a group. This is called a group identify operation and can be performed by activating the appropriate Enclosure Identify indicators. For platform or systems that consist of multiple enclosures, it may be necessary to change the state of one enclosure before servicing another enclosure. For example, a system drawer (primary enclosure) may need to be powered down before servicing an I/O drawer (secondary enclosure). It may be useful in this case for the servicer to be able to identify the various enclosures that are linked. In such implementations, the enclosures should be designed with a method to activate the Group Identify function, with the “group” being all linked enclosures. One implementation of this is to put a pushbutton in proximity to the blue Enclosure Identify indicator, which is then used to activate the blue Enclosure Identify indicators of all connected enclosures, and subsequently to deactivate all of them. It is suggested that with this implementation of the Group Identify function, that this switch toggle the Group Identify function for this set of enclosures, with each push toggling the Group Identify function. If it takes awhile to activate all the blue Enclosure Identify indicators in the group, it may be useful to give the user feedback that the button has been pressed. One way to do this is to put the blue Enclosure Identify indicator next to the pushbutton into the blink state (momentarily) until all the other blue Enclosure Identify indicators in the group have been activated, and when that is complete, to put this indicator into the Identify state (on solid).
System-Level Diagrams The following figures are conceptual diagrams showing indicator roll-ups: . . .
Representation of the Indicators -- Lightpath Mode Platform
Representation of the Indicators -- Guiding Light Mode Platform
Representation of the Indicators -- Rack System
Note: Guiding Light Mode platform shown, with optional Enclosure Fault indicators. For Lightpath Mode platforms, FRU Fault indicators would always exist and would roll up to the Enclosure Fault indicator, and additionally, the Rack and Row Identify indicators would have an additional Fault indicator (not shown) and the Fault indicators at the enclosure level would roll up to those.
Service Indicator Requirements Service indicators are required on all platforms. R1--1. All platforms must implement either the Lightpath Mode or the Guiding Light Mode of service indicators, and all components and enclosures (the primary enclosure and any secondary enclosures (for example, I/O expansion drawers)) within the platform must be operating in the same mode. R1--2. Indicators defined by this architecture must not be used for any purpose other than is what is specified by this architecture, and only with the specific states defined by this architecture. R1--3. All platforms must provide the “ibm,service-indicator-mode” property in the Open Firmware Device Tree root node.
Service Indicator General Requirements This section details requirements of indicators that are not specifically LoPAR indicator type 9006 or 9007 related. These are true even if the platform does not present any 9007 indicators to the OS. This includes requirements for platform actions like roll-up. For 9006 and 9007 specific requirements, see . Requirements which are prefaced by “For Lightpath Mode platforms:” only apply to Lightpath Mode platforms. Requirements which are prefaced by “For Guiding Light Mode platforms:” only apply to Guiding Light Mode platforms. Requirements that are prefaced by neither, apply to both Lightpath Mode and Guiding Light Mode platforms. Components which are designed to work in both Lightpath Mode and Guiding Light Mode platforms, need to be able to comply with both Lightpath Mode and Guiding Light Mode sets of requirements, as well as the requirements that apply to all.
Fault Detection and Problem Determination Requirements There are two general classifications of problems which are indicated by Service Indicators: An indication for FRUs that have failed and need to be replaced An indication of other system problems that may be causing performance degradation or which might cause failures in the future, for example: A FRU that is predicted to fail (may be treated as a failing FRU by some implementations) An over temperature conditions A loss of redundancy that is not caused directly by a FRU failure (for example, greater than 100% of the power of the base power being used) A configuration problem such as a missing resource, resource plugged into the wrong slot, or invalid configuration The general model for use of the Error Log Previously called the System Information (Attention) indicator. and Enclosure Fault indicators is to indicate problems as follows: Activation of either the Error Log or Enclosure Fault indicators is accompanied by a log entry in an error log that can be queried by the user Activation of the Error Log indicator is used when the user needs to perform some procedure, or acknowledge some condition, prior to taking corrective action For most types of problems, this requires the user to look into the error log at the start of the procedure In some cases (generally for more common or more urgent problems), additional indicators may be provided by a system and activated to allow the user to determine the problem without looking into the error log (these additional indicators are generally not allowed to be the same indicators as defined by this architecture, except as allowed by this architecture) The procedure performed by the user may include items like: Activation of FRU Identify indicators (for example, as in Guiding Light Mode systems) Removal and re-connection of cables, reseating of cards, etc. Activation of an Enclosure Fault (Lightpath Mode systems) is only allowed in the following cases: As an indication of the roll-up of a FRU Fault indicator In conjunction with a system error that prevents a FRU Fault indicator from being activated (this requires some other indication of the global failure problem, for example, an error code on an op panel) The following requirements define the actions to be taken by a system on the detection of a fault. R1--1. The detector of a fault condition must do the following: If the a fault occurs which cannot be isolated appropriately without the user performing some procedure, then activate the Error Log indicator. If a fault occurs which can be isolated to a single FRU and if there exists a Fault indicator for the FRU, then activate that FRU Fault indicator, otherwise activate the Error Log indicator. If a fault occurs which cannot be isolated to a single FRU and if there exists a Fault indicator for the most likely FRU in the FRU list, then activate that FRU Fault indicator, otherwise activate the Error Log indicator. If a fault occurs which is isolated to a group of FRUs (called a FRU group) and if there exists a Fault indicator for each of the FRUs, then activate all the FRU Fault indicators, otherwise activate the Error Log indicator. See also, . R1--2. (Requirement Number Reserved For Compatibility) R1--3. Service Indicators (Error Log, Fault, and Identify) must be activated appropriately to guide a user to or through a service action or procedure. R1--4. Service Indicators (Error Log, Fault, and Identify) must be deactivated appropriately, as follows: A Service Indicator activated by an entity must be automatically deactivated by that entity when that entity can determine that the activation is no longer necessary, or A Service Indicator must be automatically deactivated by the platform when the platform can determine that the activation is no longer necessary or may be necessary but will be redetected and therefore reactivated a reasonable time later, or A Service Indicator must be automatically deactivated by a service procedure which fixes the issue that caused the indicator to be automatically activated in the first place, or A Error Log or Identify indicator must be deactivated when a user request it to be deactivated by a system-level user interface. For the Lightpath UI Base Enablement, as indicated in Requirement and . R1--5. For each activation of the Error Log and Enclosure Fault Indicators, one of the following must be true: If the platform is functional enough to allow it, then an associated entry must be made in an error log that can be queried by a user interface. In the case where the platform is not functional enough to allow logging of an error log entry, then there must exist a way for the user to determine the failure associated with the indicator activation (for example, an error code on an op panel on the system). Implementation Notes: Requirement are intentionally written general enough so that different platform types have some latitude in implementation of Service Indicators. However, see the state diagrams, , for some explicit requirements for activation and deactivation of the various Service Indicators. Those state diagrams take precedence over Requirement . When the state diagrams and Requirement do not give explicit direction for implementation, implementers should consider compatibility with existing implementations when making decisions about activation and deactivation. 2. In Requirement , the physical indicator may not be turned off when deactivated from an OS interface (versus a system-level interface), if another entity outside of that OS also has the physical indicator activated. That is, if the physical indicator is the combination of several logical indicators. R1--6. The Error Log indicator must be activated only for Serviceable Events. Serviceable Events are platform, global, regional and local error events that require a service action and possibly a call home when the serviceable event must be handled by a service representative or at least reported to the service provider. Activation of the Error Log indicator notifies the customer of the event and the event indicates to the customer that there must be some intervention to rectify the problem. The intervention may be a service action that the customer can perform or it may require a service provider.
FRU-Level and Connector Indicator Requirements The indicators specified in this section represent the lowest level indicators in the indicator roll-up hierarchy. See also requirements in . For the Lightpath UI, see also the requirements in . R1--1. For Lightpath Mode platforms: All of the following must be true: A FRU Fault indicator must be implemented for every replaceable FRU, with the states of “off” and “on,” except for FRUs which are excepted in Requirement . Clearing of the FRU Fault indicator from the Fault state must be the result of part of the repair action and must be transparent to the OS(s) and SFP (that is, the OS or SFP is not required to automatically clear a FRU Fault indicator). The physical indicator which implements the FRU Fault indicator must also be Identify indicator and follow the requirements for Identify indicators. R1--2. FRU indicators (Fault and Identify) must be visible to the user during a service action. Implementation Note: Requirement may require, for example, that the indicator be able to be lit after power is removed from the system, requiring storage of power on the component with the indicator (for example, via a capacitor) and activation of the indicators by a push button by the user (see Requirement for requirements on this implementation). Another example would be via the use of a light pipe from the indicator to a visible place. R1--3. For Guiding Light Mode platforms: If a FRU Fault indicator exists, then it must be transparent to the OS, SFP, and HMC and it must be independent of, and not physically combined into the same indicator with, any indicator defined by this architecture, including the setting of, resetting of, and displaying of the state of that indicator, except that a FRU Identify indicator may be activated to the Fault state (on solid) as a result of a FRU failure if all of the following are true: The failure that is being indicated must be a failure which prevents the user from activating the said FRU Identify indicator to the Identify state. Clearing of the FRU Fault indicator must be the result of part of the repair action and must be transparent to the OS, SFP, and HMC. Architecture Notes: For Guiding Light Mode platforms, the only FRU-level indicators that are allowed to be visible to an OS, SFP, or HMC are the FRU Identify indicators. For Guiding Light Mode platforms, the only Fault indicator that is allowed to be visible to an OS, SFP, or HMC is the Error Log indicator. Examples of the exception of the use of FRU and enclosure indicators in Requirement as an indication of a fault are: when the path for controlling an Enclosure Identify indicator or FRU indicator in that enclosure is broken, or when the power supply in the enclosure is broken and the indicator cannot be activated to the Identify state. In these cases the FRU and/or enclosure indicators may be activated (transparently) to the Fault state to indicate the failure, and would be returned to the Normal state as a result of the repair action that fixes the problem. R1--4. All platforms designs, except very low end servers, The term “very low end servers” is not explicitly defined here, but is used to refer to implementations where FRU-level indicators cannot reasonably be implemented (for example, due to size constraints) or where the product can show explicit financial justification for not implementing. must include an Identify indicator for every FRU with the states of “off” and “blink,” except for the following classes of FRUs: If a device driver has access to some standard form of Identify/Fault indicators for the DASD devices it controls (for example, some standard form of enclosure services), then the platform does not need to provide FRU indicators for these devices. If a device driver has access to some standard form of Identify/Fault indicators for the removable media devices it controls (for example, some standard form of enclosure services), then the platform does not need to provide FRU indicators for these devices. External enclosures other than PCI expansion enclosures, and external devices (for example, keyboard, mice, tablets) that attach via cable to IOAs, do not require FRU indicators. Cables that connect from IOAs to the devices defined in parts , , and of this requirement do not require FRU indicators. Internal cables, interposers, and other FRUs which do not contain active components do not require FRU indicators. Implementation Note: Even though an item falls into the list of possible exceptions in Requirement , the designer of such a component should verify that leaving off the FRU Identify indicator from their component will not prevent the systems in which that component is used from meeting their serviceability requirements. R1--5. All FRU-level Identify indicators must implement the state diagram shown in , except that the Fault state is not required for Guiding Light Mode platforms. R1--6. All platforms must include an Identify indicator with the states of “off” and “blink” for every connector that is to be involved in an Identify operation. R1--7. FRU-level and connector-level indicators must be made visible to the OS(s) as follows, and must be made transparent otherwise: For Lightpath Mode platforms: FRU Fault indicators must be made visible to the OS for FRUs for which the OS image is expected to detect errors for either the entire FRU or part of the FRU. FRU Identify indicators must be made visible to the OS for FRUs that are fully owned by the OS image. Connector Identify indicators must be made visible to the OS for connectors that are fully owned by the OS image and for which a connector Identify indicator exists. Implementation Notes: In Requirement , for FRU Fault indicators that are shared, the FRU Fault indicator is virtualized, so that one partition cannot view the setting by another partition, which would allow a covert storage channel (see also and ). Ownership of a FRU or connector by the OS image is defined as being the condition of the FRU or connector being under software control by the OS, a device driver associated with the OS, or application software running on the OS. R1--8. An OS which activates a FRU Identify indicator must provide a method of deactivating that indicator. R1--9. (Requirement Number Reserved For Compatibility)
Enclosure-Level Indicator Requirements See also requirements in . For the Lightpath UI, see also the requirements in . R1--1. On the System Enclosure: The platform must implement an Error Log indicator and all of the following must be true: The states of “off” and “on” must be implemented and must be used for the Error Log function (Requirement Number Reserved For Compatibility) This indicator must roll-up to the rack indicator, when the rack indicator is implemented, and for blade implementations, to the Chassis Error Log indicator. The indicator must implement the state diagram shown in . The platform must provide a mechanism to allow the user to put the Error Log indicator into the off state. R1--2. Except for enclosures that contain only FRUs that are exempted from FRU-level indicators as specified by Requirement parts , , and , and which also do not have any Connector Identify indicators, the platform must implement an Enclosure Identify indicator on all enclosures, and all the following must be true: The states of “off,” “blink,” and “on” must be implemented and must be used for the Identify function. This indicator must roll-up to the rack indicator, when the rack indicator is implemented, and for blade implementations, to the Chassis Enclosure Identify indicator. The indicator must implement the state diagrams shown in and . R1--3. For Lightpath Mode Platforms: All the following must be true for the Enclosure Fault indicator: The platform must implement an Enclosure Fault indicator on each enclosure in which there exists at least one FRU Fault indicator. These indicators must implement the states of “off,” “on,” and “blip”. These indicators must implement the state diagram as shown in . These indicators must not be visible to any OS image. The platform must provide a mechanism to allow the user to put each Enclosure Fault indicator into the blip state. This indicator must roll-up to the rack indicator, when the rack indicator is implemented, and for blade implementations, to the Chassis Enclosure Fault indicator. Implementation Note: One way of achieving Requirement is to provide a pushbutton (for example, on the secondary indicator panel). R1--4. (Requirement Number Reserved For Compatibility) R1--5. (Requirement Number Reserved For Compatibility) R1--6. Enclosure-level indicators must be made visible to the OS(s) as follows, and must be made transparent otherwise: Enclosure Identify indicators must be made visible to the OS when the OS fully owns a FRU in the enclosure. The Error Log indicator must be made visible to each OS image. Implementation Notes: In Requirement , for indicators that are shared, the indicator is virtualized, so that one partition cannot view the setting by another partition, which would allow a covert storage channel (see also and ). Ownership of a FRU by the OS image is defined as being the condition of the FRU being under software control by the OS, a device driver associated with the OS, or application software running on the OS. R1--7. An OS which activates an Error Log indicator must provide a method of deactivating that indicator, when such an activation is not be deactivated automatically as part of the service action. Implementation Note: Relative to Requirement , it is recommended that an OS that activates an Error Log indicator, provide a way to deactivate that indicator, regardless of whether that indicator would be reset as part of a service action. R1--8. An OS which activates an Enclosure Identify indicator must provide a method of deactivating that indicator. R1--9. (Requirement Number Reserved For Compatibility) R1--10. For Guiding Light Mode Platforms: If a FRU Fault indicator exists, then it must not roll-up to the Enclosure Identify or Error Log indicator, and if there is such a requirement to roll-up such an indicator, then the enclosure must implement an Enclosure Fault indicator, with the same requirements as the Enclosure Fault indicator for Lightpath Mode platforms.
Rack-Level Indicator Requirements See also requirements in . R1--1. If a platform implements a rack-level indicator then all of the following must be true: The rack indicator must be transparent to the OS, SFP, and HMC. The rack indicator must be Highly visible As defined by our usability groups (distance and viewing angle) with covers in place. For Lightpath Mode: The rack tower indicator must implement the state diagram indicated in , , and . For the Guiding Light Mode: The rack tower indicator must implement the state diagram indicated in , , and if the optional Enclosure Fault indicators are implemented, then .
Row-Level Indicator Requirements R1--1. If a system implements a row-level indicator to roll-up a row of rack-level indicators, then the following must be true for these indicators: The indicator must be transparent to the OS, SFP, and HMC. For Lightpath Mode: This indicator must implement the state diagram indicated in , , and . For the Guiding Light Mode: This indicator must implement the state diagram indicated in , , and if the optional Enclosure Fault indicators are implemented, then .
Shared Indicator (Multiple Partition System) Requirements To avoid covert storage channels (see ), virtual indicators are required for physical indicators which are shared between OS images. R1--1. If a physical indicator (Fault or Identify) is shared between more than one partition, all the following must be true: Except where there is explicit trust between the partitions, the platform must provide a separate virtual indicator to each non-trusted partition for each shared physical indicator and must control the physical indicator appropriately, as indicated in the state diagrams in . If software in a partition senses the state of the virtual indicator, it must take into consideration that it is seeing the virtual state and not the real state of the indicator, with the virtual state being what the partition set the indicator to, and this is not necessarily what the physical indicator is actually displaying. The SFP must be given access (sense and set) to the physical FRU level indicators, and the platform must clear all the corresponding virtual indicators when physical indicator is cleared by the SFP. The SFP must be given access (sense and set) to the physical Error Log indicator, and the platform must not clear the corresponding virtual indicators when physical indicator is cleared by the SFP. Architecture Note: In Requirement , an example of “explicit trust” is where the sharing partitions are the SFP and one other partition, where the SFP is running in an OS where all the applications and drivers can be trusted to not open a covert channel to the other OS or application in that other partition. In Requirement , it may be possible for the SFP to get direct access to the virtual indicators, but such access is beyond the scope of this architecture.
Additional Indicator Requirements R1--1. A user interface which presents to a user the state of the Identify indicators or which allows the user to set the state of the Identify indicators, must be prepared for an indicator to disappear from the list of indicators available to the OS image (for example, a “no such indicator” response to a set request), and must provide the user with an appropriate message and recovery (for example, prompt the user whether they want to refresh the list of available indicators). R1--2. The color of indicators must be as follows: FRU Identify, FRU Fault, Enclosure Fault, Error Log indicators, and any roll-up indicators for Error Log (rack-level, blade system chassis-level, and row-level) must be amber. The Enclosure Identify indicators and any roll-up indicators for these indicators (rack-level, blade system chassis-level, and row-level) must be blue. R1--3. The blink rate of all Identify indicators which blink, must be nominally 2 Hz (minimum 1 Hz) with a nominal 50% duty cycle. Implementation Note: The 1 Hz rate should not be used unless absolutely necessary. The 1 Hz rate is put in to be consistent with the industry standard SHPC specification, which specifies 2 Hz with 1 Hz minimum. R1--4. The “blip” rate for the Enclosure Fault indicators when in the “remind” state must be nominally 0.5 Hz with a duty cycle of 0.2 seconds on, 1.8 seconds off. R1--5. All indicator roll-up (activate and deactivate) must be controlled entirely by the platform and must be transparent to the OS, SFP, and HMC. R1--6. Duplicate indicators that are implemented to reflect the same state as another indicator in the system (for example, an indicator on the back of an enclosure that is to reflect the same visible state as the enclosure indicator on the front of the enclosure) must be transparent to the OS and must be kept synchronized by the platform. R1--7. The platform must provide a way to light all the indicators (Identify, Fault, Error Log, etc.) without any OS present, for test purposes (manufacturing, field service, etc.). R1--8. The hardware must provide the firmware a way to read the state of the indicators (that is, the register which controls the visual state, not the actual visual state) as well as to set the state of the indicators. R1--9. The platform must be designed such that permanently removing a FRU does not remove the capability to use the Identify indicator(s) remaining in the platform or affect any roll-up. R1--10. In reference to Requirement , if a capacitor and pushbutton are used to be able to activate the indicator after removal of the part, then all the following must be true: For Lightpath Mode platforms: Both the Identify and the Fault states must be supported, and the indicator must activate when the push button is depressed and must go off (with the remaining capacitor charge maintained) when the pushbutton is released (Identify state is displayed as “blink” and Fault state as “on”). For Guiding Light Mode platforms: The Identify state must be supported, and the indicator must activate (“blink”) when the push button is depressed and must go off (with the remaining capacitor charge maintained) when the pushbutton is released. There must be a green indicator next to the pushbutton and the indicator must get turned on when the button is pressed and when there is charge in the capacitor, and must be off when the button is not pressed. The capacitor must have the capability to store enough charge for two hours and after that period must be able to light for 30 seconds the green indicator and enough other indicators to be able to identify any necessary group of FRUs (for example, four additional indicators if a group of four DIMM locations is to be identified simultaneously). Implementation Note: As part of Requirement , it is not necessary to roll-up any activated indicators to the next level when the button is pressed. R1--11. All indicators which are under standby power must work the way they do when full power is applied to the system, including all of the following: The indicators must continue to display the last state displayed when the system power went to standby-only power, unless the state is changed during the standby state by the user or by a service action. The changing of the state to the Identify state and then back to the previous state by the user must be supported, when that functionality is supported during full system power. Implementation Note: Internal to the platform firmware, it will most likely be required to have a common control point for all service indicators in order to meet the requirements and meet the necessary state information R1--12. Any secondary (intermediate) level roll-up indicator (see ) must behave as follows: Be blinking, if any Identify indicator that it represents is blinking Be on solid if any Fault indicator that it represents is on and no Identify indicator that it represents is blinking Be off if all indicators that it represents are off R1--13. The icons used for the following indicators, and any roll-up of the same, must be as follows (see the usability specifications for size, color, and placement): For the Error Log indicator: For the Enclosure Fault indicator: For the Enclosure Identify indicator:
Blade Systems Chassis-level Indicator Requirements The following describes the chassis-level Error Log and Enclosure Identify indicator requirements for blade chassis implementations. These are basically the same as for the rack and row level indicators, except that the Enclosure Identify indicators are required to be able to be turned on/off by a user interface, unlike the rack and row level indicators. For the Lightpath UI, see also the requirements in . R1--1. The blade chassis must implement an amber Error Log indicator, with the state diagram indicated in . R1--2. The blade chassis must implement an amber Enclosure Fault indicator, with the state diagram indicated in . R1--3. The blade chassis must implement an blue Enclosure Identify indicator, with the state diagram indicated in .
Service Indicator State Diagrams The following state diagrams show the transitions and states for the service indicators in the system. Implementation Note: Activation of an indicator by a roll-up operation from a lower level indicator will prevent a user from turning off such an indicator from a user interface without turning off the lower level indicator. It is recommended, if possible, that in the user interface that allows the user to attempt to deactivate an indicator, provide to the user a message that the indicator cannot be deactivated, if attempted, when a roll-up to that indicator is active. That is, something better than just silently not turning off the indicator. Alternatively, the user can be shown that the option of turning off such an indicator is not possible, when a roll-up to that indicator is active (for example, by graying out the option on the user interface).
FRU or Connector Fault/Identify Indicator State Diagram
Notes: Not being available means the failure that is being indicated must be a failure which prevents the user from activating the Identify for the FRU. Transition to Fault state may occur if a failure occurs which would prevent the activation of the indicator into one of the Identify states. Not all FRU Fault indicators in an enclosure get activated like this simultaneously; only those that are directly involved with the fault (for example, like the FRU Fault indicator associated with the indicator controller hardware) OS is not expected to change an indicator from Fault to Normal, but is permitted to do so (providing that it has access to the indicator because it owns the resource). Transition from Fault to the Identify or Normal states by the OS may not be possible if a hardware fault causes a failure which prevents access to the indicator. Format on the above diagram of “xxxx,y” means a call to the set-indicator or ibm,set-dynamic-indicator RTAS call with an indicator token of “xxxx” and a state value of “y” (only the token applicable for the specific indicator causes a state transition). The 9002 Identify and Action are the same state. 9006 FRU-level indicators only provided in Lightpath Mode platforms. Fault indicators may be virtualized, with several OS images and firmware given access to a virtual FRU Fault indicator which controls the same physical Fault/Identify indicator. These get combined as shown in the state diagram, above; all virtual Fault indicators basically get ORed together. This indicator may be forced to the Normal (off) state under certain circumstances (for example, see Requirement ). For the Lightpath UI, when implemented, other transition conditions are possible. See for requirements.
Error Log Indicator State Diagram
Notes: Format on the above diagram of “xxxx,y” means a call to the set-indicator or ibm,set-dynamic-indicator RTAS call with an indicator token of “xxxx” and a state value of “y” (only the token applicable for the specific indicator causes a state transition). This indicator may be forced to the Normal (off) state under certain circumstances (for example, see Requirement ). See Requirement . For the Lightpath UI, when implemented, other transition conditions are possible. See for requirements.
Enclosure Identify Indicator State Diagram for Scalable Systems
Notes: This indicator may be forced to the Normal (off) state under certain circumstances (for example, see Requirement ). This indicator is off at the end of POST. The states in this diagram overlay the corresponding states in . This figure represents the POST states and the after-POST states. The use of the Optional Identify state to indicate boot identify is only to be used for boot servers for scalable system nodes or blades of a blade system (for example, NUMA system nodes), and not for stand-alone systems or blades
Enclosure Identify Indicator State Diagram
Notes: This indicator may be activated to the on state by any OS which is given access to the indicator per Requirements. For indicators that are shared by multiple OS instances, this indicator is virtualized (see and ). LoPAR compliant OSs are only given the capability to activate the Enclosure ID to the on state, not to the blink state. The blink state may be activated through an external platform management interface by a user request through that interface to blink the physical Enclosure ID. This indicator may be forced to the off state under certain circumstances (for example, see Requirement ). This indicator is off at the end of POST. A user is not allowed to deactivate the Enclosure ID if there are still FRU IDs still active. A user request through a privileged user interface (for example, via an SFP) to set the physical Enclosure ID to off, forces any virtual Enclosure ID indicators that are active (on or blink) to their off state, but this does not override any FRU ID roll-ups (see Note ). For Scalable (NUMA) systems, the states in this diagram overlay the corresponding states in . This figure represents the after-POST states and . the POST states. A virtual Enclosure ID can be activated or turned off by the 9007 indicator token for the target Enclosure ID.
Enclosure Fault Indicator State Diagram
Notes: There is no direct activation or deactivation of this indicator by any OS. See Requirement and the Implementation Note below that requirement. This indicator may be forced to the Normal (off) state under certain circumstances (for example, see Requirement ). Activation of an Enclosure Fault indicator without activating a FRU Fault indicator within the enclosure is to be used only in exceptional cases where the FRU Fault cannot be activated. In such cases the system is required to also provide further direction to the user on how to resolve the fault (for example, by providing an error code on an op panel on the system).
For Blade Systems: Chassis-level Error Log Indicator State Diagram
For Blade Systems: Chassis-level Fault Indicator State Diagram
For Blade Systems: Chassis-level Enclosure Identify Indicator State Diagram
Notes: This indicator may be forced to the Normal (off) state under certain circumstances (for example, see Requirement ). A user is not allowed to deactivate the chassis Enclosure ID if there are still FRU Identify or Blade Enclosure Identify indicators still active (see state transition qualifiers in the above diagram). A user request to set the Chassis Enclosure ID to the Identify (blink) state temporarily overrides roll-up operations (roll-up operations set the indicator to the on state). A user request to change state of the Chassis Enclosure ID cancels any previous user request against the same indicator, replacing the user requested state with the new state.
Rack-level Error Log Indicator State Diagram
Rack-level Fault State Indicator Diagram
Rack-level Enclosure Identify Indicator State Diagram
Row-level Error Log State Diagram
Row-level Fault State Diagram
Row-level Identify State Diagram
Notes: A blinking Enclosure ID is assumed to be “active” for purposes of the Row Enclosure ID indicator state.
Requirements for 9002, 9006, and 9007 Indicators See for service indicator requirements that are not 9006 and 9007 specific. R1--1. When the platform presents a 9006 indicator to an OS, the following must be true: The platform must set the location code of the Error Log (9006) indicator and sensor to be the location code of the system and this indicator or sensor must be the first one in the list of 9006 indicators or sensors. For Lightpath Mode platforms: The platform must set the location code of each FRU Fault indicator and sensor to be the location code of the component to which that indicator or sensor is associated. For every 9006 indicator, there must be a corresponding 9006 sensor which has the same index as the corresponding indicator. R1--2. When 9007 indicators are to be provided to an OS, the platform must implement the ibm,get-indices RTAS call and must present that call in the device tree for the OS, and the OS needing access to the 9007 indicators and sensors must use the ibm,get-indices call to get the indices of the 9007 indicators and sensors available to the partition at the time of the call. Software Implementation Notes: Relative to Requirement , due to Dynamic Reconfiguration, the indicators available at any point in time might be different than on a previous call to ibm,get-indices. 9007 indicators may need to be provided to the OS in the order in which they are best displayed to the user, because the OS or the UI may not reorder them (for example, sort them) before presenting them to the user. This is true regardless of the method of presentation to the OS (OF device tree or ibm,get-indices RTAS call). Relative to presentation order, see also Requirement R1--3. If a platform provides any 9007 indicators to the OS, then the following must be true: The platform must set the location code of each Identify (9007) indicator and sensor (Enclosure, FRU, or connector) to be the location code of the enclosure, FRU, or connector to which that indicator is associated. The System Enclosure Identify (9007) indicator must be the first indicator in the list of 9007 indicators. For every 9007 indicator, there must be a corresponding 9007 sensor which has the same index as the corresponding indicator. R1--4. A DR indicator (9002) must only be provided to an OS if that particular OS image owns that resource and is going to control the physical add, remove, and replace operations on the FRU which is pointed to by that particular DR indicator. R1--5. If a PCI Hot Plug slot implements a single physical amber indicator for use as both the PCI Hot Plug DR indicator (for concurrent maintenance) and as the FRU Identify indicator, then that indicator must be presented to a LoPAR compliant OS as both a 9002 and 9007 indicator. R1--6. All platforms must provide the “ibm,fault-behavior” and “ibm,fru-9006-deactivate” properties in the root node of the OF device tree, both with a value of 1.
Lightpath User Interface (UI) Requirements The base Lightpath architecture does not provide a User Interface (UI), per se, when one considers a UI as being an interactive entity; that is, where the user can input requests as well as just see the faults. When enabling the Identify indicators of the Lightpath mode, a UI is necessary. This architecture will call this the Lightpath UI. The Lightpath UI is an interface between the Service Focal Point (SFP) and the user of the SFP, and at a minimum, provides an interface to show hidden Fault indicators (for example, see ). A slightly more sophisticated Lightpath UI -- one with a General UI (GUI) such as an LCD or general display like provided by IBM Director -- is required to provide access to the Identify indicators. Enablement of the Identify portion of Lightpath is important in larger systems for reasons of deferred maintenance and guided maintenance. In a system with deferred maintenance and Lightpath, many Fault indicators may remain lit, requiring directed repair via an Identify operation in order to see the component against which to do a particular repair action. In addition, guided maintenance may be required even if there is no failing component, to indicate to the user where to plug or un-plug components or cables. When a Lightpath UI is available, the platform does not display logical Fault or Error Log on the physical indicators until a user requests such a display of the indicators, with the exception that the highest level roll-up indicators will be lit as a flag for the user to use the Lightpath UI to identify the problem. The request to display Fault and Error Log indicators may be made, for example, by pressing a button or series of buttons, or by checking a check-box on a more sophisticated Lightpath UI. The button(s) may be physical or may be on a device like an LCD panel or other Service Focal Point (SFP) display, like an IBM Director display. defines an SFP as: “…common point of control in the system for handling all service actions which are not resolved otherwise (for example, via Fault indicators).” SFPs may or may not exist lower end systems, and may exhibit different levels of sophistication in larger systems. The following are some (not all) system implementation examples: For simple systems, there may be no SFP and no Lightpath UI, which means everything needs to be resolved by Fault indicators. For simple systems implementing Triple-S (see ), there may exist a simple SFP with a simple Lightpath UI like one or more physical push-buttons. This could be the System Error indicator with a physical button associated with it, with the SFP being firmware underlying the button to communicate with lower layers of firmware (for example, turn off FRU Fault indicators as they are activated by the firmware, turn on all active FRU Faults indicators on a button press). There may also be buttons for enabling the lower layers of the Fault indicator hierarchy, and these buttons inform the SFP firmware of the user’s request to display Fault indicators on the physical indicators. In this case, the Lightpath UI is not full-function and does not provide for enablement of the Identify indicators. In this case, the firmware driving the Lightpath UI would use the Lightpath UI base enablement (see ). For intermediate systems, the Lightpath UI could be an LCD panel. In this case, the firmware driving the Lightpath UI would use the Lightpath UI base enablement (see ). The Triple-S UI is also possible (see ). For larger systems, the Lightpath UI could be part of a more sophisticated interface, like IBM Director. This more sophisticated interface would use the Lightpath UI base enablement (see ). The Triple-S UI is also possible (see ).
Lightpath UI Base Enablement Requirements This section defines the base enablement requirements for all Lightpath UI implementations. The Triple-S UI is one example of such a Lightpath UI that uses the Lightpath UI Base Enablement. Other Lightpath UIs are possible, and are not limited by this architecture. R1--1. For the Lightpath UI Base Enablement: The platform must do all of the following: Implement Lightpath Mode, as defined by this architecture, lighting FRU Fault indicators or Error Log indicator associated with a fault. Lightpath Mode includes the implementation of Identify indicators. If the SFP is separate from the platform, then report to the SFP that the platform implements the Lightpath UI Base Enablement (explicitly or implicitly). (see implementation note, below) Whenever possible, report all fault conditions which activate a FRU Fault indicator or Error Log indicator, up to the SFP, with enough information to allow determination by the SFP as to which FRU or Error Log indicators are activated and the possible failing FRU(s). See the implementation note, below, for the only exception cases allowed to this requirement. Accept commands from the Service Focal Point (SFP) to put each indicator (FRU, Enclosure, etc.), into the Off, Fault and Identify states (that is, the SFP can control each indicator), and not report an activation or deactivation error to the SFP if the SFP requests putting the indicator into a state to which the indicator is already activated. See the implementation note, below, for the only exception cases allowed to this requirement. Prevent multiple reports of same error, whenever possible. (see implementation note, below) Implementation Notes: Requirement allows a SFP to manage multiple platforms that implement different Service Indicator modes. Note that this requirement can be implemented implicitly from other information reported to the SFP (for example, machine type/model). In Requirement and , acceptable reasons for not being able to report errors to the SFP or have the SFP control the LEDs may include: Loss of communications between the component and the SFP. A fault indicator that is entirely controlled by an OS, hardware, or code, or an entity which is not in communications with the platform firmware or the SFP. Requirement prevents continual “blinking” of Fault indicator and the flooding of the SFP’s event or error log. R1--2. For the Lightpath UI Base Enablement: The Service Focal Point (SFP) must exist and must do all of the following: Receive and log fault conditions reported by the platform. (see implementation note, below) Turn off each Fault indicator or Error Log indicator associated with a fault condition, as soon as possible after the fault is reported, except as required to remain on by user request user request (for example, see Requirement ). Accept direction from a user to show any faults on the Fault and Error Log indicators (for example, see Requirement ). If the SFP contains a GUI (for example, an LCD display or a display like provided by IBM Director), accept direction from a user to Identify a FRU or connector for a service operation, and then turn off all activated FRU Fault and Error Log indicators, unless otherwise directed by the user (for example, by a check-box on the UI), and activate the FRU Identify (blink), along with the normal FRU roll-up defined by the base Lightpath Mode. Implementation Notes: Relative to Requirements , a SFP may (but is not required to) do additional failure analysis, or may apply policy rules, on the failure(s) reported, and by doing so may change or re-prioritize the list of failures, such that the most likely failure(s) is (are) different than the fault indicator(s) that were initially turned on by the detecting entity. In which case, when the user requests that the indicators be reactivated, a different set may be activated than those that were originally activated. In simpler systems, it is expected that there may only be one push-button implemented, and that would be associated with the highest level Fault roll-up indicator. For systems, or collection of systems that are managed by a SFP, which consist of many enclosures, it may be useful for an implementation to implement several levels of buttons. For example, a SFP that manages multiple systems may (at least) implement one button per system.
See/Select/Service (Triple-S) User Interface Requirements The Triple-S UI architecture is built on top of the Lightpath UI Base Enablement architecture, which is in turn built on top of the Lightpath architecture. The Triple-S architecture is basically defined as follows (see Requirements for specifics): Do not display Fault or Error Log on the physical indicators until user pushes a button or series of buttons. Except that the highest level roll-up for the Enclosure Fault indicators and Error Log indicators will be activated if a lower level one of the same type was activated. After seeing the highest level roll-up for Enclosure Fault or Error Log being on, the user pushes one or more buttons (logical or physical) associated with those, and then the user Sees the Faults available for servicing. The user Selects the item they want to service, by observing the FRU Faults and selecting they want to then Service. The Selects part of Triple-S may also involve activation of the FRU Identify indicator from the Lightpath UI. The user completes the Service action on that component which was selected. R1--1. For the Triple-S UI: The Lightpath UI Base Enablement requirements must be met, as defined in . R1--2. For the Triple-S UI: The platform must provide one or more push-buttons (physical button, or logical on a GUI), each associated with a set of Fault indicators or Error Log indicators which is (are) to be used by the user to display (“show”) or not display (“hide”) fault conditions on those group of indicators, as defined by Requirement . R1--3. For the Triple-S UI: The Service Focal Point (SFP) must accept direction from a user from a push-button (physical or logical) press to show any fault conditions on, or hide all fault conditions from, the physical indicators (FRU Faults and any associated roll-up indicators for those indicators), which are associated with the push-button (Fault or Error Log indicators). The fault conditions must represent any open problems known by the SFP related to the set of indicators associated with the push-button. The push of the button must be a toggle operation, with each press either going from the show state to the hide state or from the hide state to the show state, based on the state prior to the push-button press. The platform must turn off any indicators turned on by these push-button activations after a pre-set period of time after the button activation, unless the pre-set time is set to 0, in which case the indicators are left on until the button is press again or until the platform determines they are no longer needed to be on. (see implementation note, below). R1--4. For the Triple-S UI: For more complex systems, and as determined by the RAS requirements for those systems, the SFP must implement a GUI (for example, an LCD or IBM Director display) and provide the capability to activate the Identify indicators, as defined in Requirement . Implementation Notes: Relative to Requirement , a SFP may (but is not required to) do additional failure analysis, or may apply policy rules, on the failure(s) reported, and by doing so may change or re-prioritize the list of failures, such that the most likely failure(s) is (are) different than the fault indicator(s) that were initially turned on by the detecting entity. In which case, when the user requests that the indicators be reactivated, a different set may be activated than those that were originally activated. Relative to Requirement , the set of indicators associated with a given push-button will normally be hierarchical, based on the FRU Fault or Error Log roll-up path. For example, if a push-button is associated with the Chassis Enclosure fault indictor, pressing that button would toggle the show/hide state for all fault indicators within that Chassis. Another example is pressing of a button associated with the System Error roll-up indicator for a system, putting that system into the “show” state could put that system basically into a Lightpath (without Triple-S) mode, or “Lightpath Classic” mode. In this latter example, it is not quite like previous implementations of Lightpath because (1) service procedures may be different, (2) based on Implementation note (a) the set of FRU Faults activated by the SFP may be different than those activated by the entity detecting the error originally, and (3) the Identify function can be used.
Green Indicator Requirements This chapter defines the platform requirements for green indicators. The usage of green indicators has been separated from the rest of this chapter, because even though green indicators are used in some service procedures (for example, to check for presence or absence of power on a component or system), they are not to be used in lieu of amber FRU Fault and Identify indicators. That is, they should supplement, not replace, the amber indicators. There are several exceptions to having all the green indicator requirements in this chapter: The green indicator associated with a capacitor and pushbutton implementation is specified in Requirement .). The capability to light all green indicators, as well as the amber and blue indicators, for test purposes, is specified in Requirement . Unless indicated otherwise in this chapter, the blink rate for green indicators, when they blink, is specified in Requirement .
Green Indicator Uses and General Requirements Green indicators generally are not used for indicating a fault condition. R1--1. A green indicator must not be used in place of an amber Fault/Identify indicator, except when use of amber Fault/Identify indicator is not possible, and in this exceptional case, the green must be off or blinking to indicate the error condition. Implementation Note: Examples where a green indicator might be used instead of an amber Fault/Identify indicator are: In a power supply to indicate lack of AC power (green off). In the case where there is insufficient power to power the component (green blinking). R1--2. There must exist a green power indicator for every FRU that is to participate in concurrent maintenance (“hot plug” operation), unless that FRU does not require the removal of power to remove or insert that FRU.
Green Indicator States This section attempts to capture the state requirements for all usages of green indicators. If a state or usage is not specified, then the user needs to get with the Architecture team for this architecture in order to add or replace any state or usage of that state.
Power Supply Green Indicators R1--1. For power supply indicators, the platform must implement the states defined in for each green indicator, and must use those states only for the usages stated in .   Power Supply Green Indicator States and Usage Green Indicator State Usage Any not already covered in this table Consult with the xipSIA architecture team for the proper usage/behavior. Off For the input power indicator: no input power. For the output power indicator: no output power. On For the input power indicator: input power good. For the output power indicator: output power good. Slow blink (1 Hz, 50% duty cycle) Power supply (or supplies) are in the standby state. A power supply must not blink its green output power indicator unless that particular supply is in the standby state.
System Power Green Indicators R1--1. For system power indicators, the platform must implement the states defined in for each green indicator, and must use those states only for the usages stated in .   System Power Green Indicator States and Usage Green Indicator State Usage Any not already covered in this table Consult with the xipSIA architecture team for the proper usage/behavior. Off System is off (no standby power). On System is on (operational state). Fast blink (4 Hz, 50% duty cycle) A determination is being made as to whether the system (for example, a Blade in a Blade System) has enough power available to it, in order to power up, or a determination has already been made that there is not enough power, and the indicator remains in this state. Slow blink (1 Hz, 50% duty cycle) System is in the standby power state. Fade-in/fade-out cycling of the power LED as done in various PC and notebook manufacturers: the period of this fade-in/fade-out cycle is 2 seconds, gradually ranging from fully on to fully off Systems that support system-level sleep states (such as the S3 sleep state) must use this state as a way to indicate the system is sleeping but still powered on.
HDD Green Indicators R1--1. For Hard Disk Drives (HDD) the platform must implement the states defined in for each green indicator, and must use those states only for the usages stated in .   HDD Green Indicator States and Usage Green Indicator State Usage Any not already covered in this table Consult with the xipSIA architecture team for the proper usage/behavior. Off Platform specific. On Platform specific. Flickering (randomly blinking) HDD activity (HDD is powered on and is being used).
Other Component/FRU Green Indicators This section attempts to capture the state requirements for usages of green indicators that are not specifically called out as special cases elsewhere in . To reiterate what was specified, above: if a state or usage is not specified, then the user needs to get with the Architecture team for this architecture in order to add or replace any state or usage of that state. R1--1. For FRUs or components other than specific ones specified elsewhere in , which require power indicators, the platform must implement the states defined in for each green indicator, and must use those states only for the usages stated in .   Sub-Unit (Component) Green Indicator States and Usage Green Indicator State Usage Any not already covered in this table Consult with the xipSIA architecture team for the proper usage/behavior. Off Component/FRU is powered off and/or is not in operation. On Component/FRU is powered on. Blink 1 Hz, 50% duty cycle (Optional) Component/FRU is in transition to the off state. Note that although this is an optional state, it is highly recommended (for Human Factors reasons) for cases where it takes awhile to power off the component (for example, for hardware like a Blade in a Blade System that has to be quiesced before powering off).
Communication Link Green Indicators R1--1. For communication links, the platform must implement the states defined in for each green indicator, and must use those states only for the usages stated in .   Communication Link Green Indicator States and Usage Green Indicator State Usage Any not already covered in this table Consult with the xipSIA architecture team for the proper usage/behavior. Off No link connection or link connected but no activity. Flickering (randomly blinking) Communication link activity.