Dynamic Reconfiguration (DR) Architecture

Dynamic Reconfiguration (DR) Architecture Dynamic Reconfiguration (DR) is the capability of a system to adapt to changes in the hardware/firmware physical or logical configuration, and to be able to use the new configuration, all without having to turn the platform power off or restart the OS. This section will define the requirements for systems that support DR operations.

DR Architecture Structure shows the relationship of the DR architecture with LoPAR and the relationship of the individual DR pieces with the base DR architecture. Each specific DR option (for example, PCI Hot Plug) will have a piece that sits on top of the base DR option. The base DR option is the set of requirements that will be implemented by all DR platforms and that will be utilized by the OS that supports any of the specific DR options. The specific DR options will call out the base DR option requirements as being required. Therefore, in the figure, any specific DR option is really that specific DR option piece plus the base DR option. The base DR option is not a stand-alone option; a platform which supports the base DR option without one or more of the specific DR option pieces that sit on top of it, has not implemented the DR architecture to a level that will provide any DR function to the user. Likewise, a DR entity will meet the requirements of at least one of the specific DR options, or else software is not required to support it as a DR entity. Thus, the base DR option is the common building block and structure upon which all other specific DR options are built. DR operations can be physical or logical. Currently the only physical DR entities are PCI Hot Plug. That is, the OS only has control over the physical DR operations on PCI IOAs. The current direction for hot plug of other DR entities is to do the physical hot plug (power up/down, control of service indicators, etc.) via the HMC and to bring the entity into usage by an OS via logical DR operations (Logical Resource DR -- LRDR). The PCI Hot Plug DR option can be found in . The Logical Resource Dynamic Reconfiguration option can be found in . It is expected that as time goes on, the base DR option may be expanded upon by addition of other DR options.

DR Architecture Structure

Definitions Used in DR DR Definitions Term Definition Base Dynamic Reconfiguration (DR) option The base on which all of the specific DR options are built. Specific DR options include, for example, the PCI Hot Plug DR option, processor card DR option, etc. These specific DR options each include the requirement that all the base DR option requirements be met. See for more information about the structure of the DR architecture pieces. Dynamic Reconfiguration (DR) The capability of a system to adapt to changes in the hardware/firmware configuration with the power on and the OS operating, and to be able to use the new configuration. This is a piece of the High Availability puzzle, but only one of the pieces. Addition, removal, and replacement may, in general, be done with the power on or off on the connector into which the entity is being added or removed. For the PCI Hot Plug option, the power to the slot is turned off and the logic signals are electrically isolated from the connector during the plug or unplug operation. Depth First Refers to a method where a tree structure (for example, a set of PCI buses connected by PCI to PCI bridges) is traversed from the top to the bottom before all the siblings at any particular level are acted upon. DR Connector (DRC) The term “DR connector” will be used here to define the plug-in point for the entity that is participating in DR. For example, a ‘slot’ into which a PCI IOA is inserted is a DRC. DR Entity An entity that can participate in DR operations. That is, an entity that can be added or removed from the platform while the platform power is on and the system remains operational. See also the definitions of logical and physical DR entities. DR Operation The act of removing, adding or replacing a DR Entity. Entity One or more I/O devices, IOAs, Processor cards, etc., that are treated as one unit. High Availability (HA) System A system that gives the customer “close” to continuous availability, but allows for some system down-time. Besides DR, other factors that need to be considered in the design of an HA system include system partitioning, clustering, redundancy, error recovery, failure prediction, Error Detection and Fault Isolation (EDFI), software failure detection/recovery, etc. I/O Adapter (IOA) A device which attaches to a physical bus which is capable of supporting I/O (a physical IOA) or logical bus (a virtual IOA) and which has its own separate set of resources is referred to as an IOA. The term “IOA” without the usage of the qualifier “physical” or “virtual” will be used to designate a physical IOA. Virtual IOAs are defined further in . Resources which must have the capability of being separate (from other devices) include: MMIO Load/Store address spaces, configuration address spaces, DMA address spaces, power domains, error domains, interrupt domains, and reset domains. Note that the hardware of an IOA may allow for separation of these resources but the platform or system implementation may limit the separation (for example, shared error domains). In PCI terms, an IOA may be defined by a unique combination of its assigned bus number and device number, but not including its function number; an IOA may be a single or multi-function device, unless otherwise specified by the context of the text. Examples include LAN and SCSI IOAs. A PCI IOA may exist as multiple device nodes in the OF device tree; that is, the OF may treat separate “functions” in an IOA as separate OF device tree nodes. IOA: built-in An IOA that is not pluggable by the user. Sometimes called integrated I/O. As opposed to an IOA that may be removed as part of a plug-in card removal (see definition for a plug-in card, below). I/O Bus A hardware interface onto which an IOA can be plugged on a platform. I/O buses discussed here include: I/O Bus: PCI The term “PCI” refers to one of: conventional PCI, PCI-X, or PCI Express. The term “bus” in the case of PCI Express refers to a PCI Express link. I/O Bus: System Bus The system bus in a platform is normally used only to attach CPUs, memory controllers, and Host Bridges to bridge to I/O buses. A platform’s system bus may, in certain circumstances, be used to attach very high speed IOAs. DR of system bus-attached entities is not considered here. I/O Device An entity that is connected to an IOA (usually through a cable). A SCSI-attached DASD device is an example. Some I/O devices and their connection points to the IOAs are designed to be plugged while the connection point is operational to the other I/O devices connected to the same IOA, and some are not. For example, while the SCSI bus was not initially designed to have devices added and removed while the SCSI bus was operational, different vendors have found ways to do so. For example, SCSI-attached DASD is pluggable and unpluggable from the SCSI bus in some platforms. Live Insertion A DR operation where the power remains on at the DR connector. Live insertion entities are always powered unless the machine power is shut off or unless a subsystem containing those entities is shut off. Logical DR entity A DR entity which does not have to be physically plugged or unplugged during a DR operation on that entity. See for a list of the supported Logical DR types. Logical Resource DR The name of the option for support of DR of logical entities. See . PCI Hot Plug DR for PCI plug-in cards where there is a separate power domain for each PCI Hot Plug slot. Platforms which do not provide individual control of power and isolation for each PCI slot but which do provide power and isolation control for groups of PCI slots (that is, multiple slots per power domain), do not provide “PCI Hot Plug,” but can support PCI DR. Physical DR entity A DR entity which may need to be physically plugged or unplugged during a DR operation on that entity. See for a list of the supported physical DR types. Plug-in card A card which can be plugged into an I/O connector in a platform and which contains one or more IOAs and potentially one or more I/O bridges or switches. Subsystem One or more I/O devices, IOAs, Processor cards, etc., that are treated as one unit, for purposes of removal/insertion.

Architectural Limitations The DR architecture places a few limitations on the implementations. Current architectural limitations include: DR operations will be user initiated at the software level before any physical plugging or unplugging of hardware is performed. This architecture will be flexible enough to add additional methods for invoking the process in the future, but for the initial architecture it will be assumed that the operation is invoked by the user via a software method (for example, invoking an OS DR services program). It is expected that some technologies which will be added in the future will allow plugging/unplugging without the user first informing the software (for example, P1394 and USB). Critical system resources cannot be removed via a DR operation. Which system resources are critical will not be defined by this architecture; it is expected that this determination will be made by the OS implementation and/or architecture. Loss of a critical resource would stop the system from operating. Many of the RTAS calls will need to work properly, independent of what is powered-off (for example, NVRAM access must work during DR operations). This is partially encompassed by the last bullet. For more information, see . Any special requirements relative to redundant power supplies or cooling are not addressed here. Moving of a DR entity from one location to another in a platform is supported through a “remove and add” methodology rather than a specific architecture which defines the constructs necessary to allow moving of pieces of the platform around. Note: The current AIX implementation does a “remove and add” sequence even when the overall DR operation is a replacement. That is, first the old entity is removed, and then the new entity is added.

Dynamic Reconfiguration State Transitions shows the states and transitions for the dynamic reconfiguration entities (DR Entities). The transition between states is initiated by a program action (RTAS functions) provided the conditions for the transition are met. Note: Relative to , physical DRC types are brought in to the “owned by the OS” states either: (1) by the Device Tree at boot time, or (2) by a DLPAR operation, which brings in the logical DRC “above” the physical DRC first, and drags the physical in as part of transferring from state 3 to state 4. Therefore no states appear in the “owned by platform” section under Hot Plug DR in the figure. So, for example, the DLPAR assignment of a PCI physical slot to an OS is done by assigning the logical SLOT DRC above the physical PCI slot, thus giving the following state transitions: state 1, to state 2, to state 3, to state 4, at which time the OS sees the physical slot, sees an IOA in the physical slot (via get-sensor-state (dr-entity-sense) of the physical DRC returning “present”), and then proceeds with the state transitions of: state 5, to state 6, to state 7, to state 8. The reverse of this (DLPAR removal of the PCI slot) is: state 8, to state 6, to state 5, to state 4, to state 2, to state 1.

Dynamic Reconfiguration State Transition Diagrams Notes: In State 5, if empty status is returned from the get-sensor-state dr-entity-sense call, then do not attempt to power-on Transitions from State 8 to 6 or from State 6 to 5 may fail (set-indicator isolation-state isolate, and get-sensor-state dr-entity-sense) if the hardware cannot be accessed to control these operations. In this case, the OS may ignore those errors if the operation is a DLPAR to remove the hardware. See also the “ibm,ignore-hp-po-fails-for-dlpar” property in .

Base DR Option

For All DR Options - Platform Requirements This section contains the extra requirements placed on the platform for all of the various DR configurations. At this time, there are no provisions made in the DR architecture for unexpected removal of hardware or insertion of hardware into a DR connector. Therefore the user is expected to interact with the DR software prior to changing the hardware configuration. For example, it is expected that most systems will require a keyboard action prior to the hardware configuration change. Future architecture might allow for other possibilities. For example, a push-button switch at the DR connector may be provided which causes an interrupt to the OS to signal that an operation is about to take place on the connector The push-button method is one that has been mentioned as a possible enhancement for systems that are produced for telephone company applications. . As mentioned in , the requirements in this section are not stand-alone requirements; the platform will also need to implement one or more of the specific DR options. R1--1. For all DR options: If the “ibm,configure-connector” property exists in the /rtas node of the OF device tree, then the platform must meet all of the requirements for the Base DR option (that is, all of the requirements labeled “For all DR options”), and must also meet all the requirements for at least one of the specific DR options. R1--2. For all DR options: The platform and OS must adhere to the design and usage restrictions on RTAS routines defined in , and any RTAS calls not specified in must comply with Note and . RTAS Call Operation During DR Operations RTAS Call Name Reference to Note Numbers RTAS Call Name Reference to Note Numbers rtas-last-error 1 ibm,read-pci-config 4 check-exception 1, 2 ibm,write-pci-config 4,7 display-character 1 restart-rtas 1 event-scan 1, 2 set-indicator 3, 4, 5 query-cpu-stopped-state 4 set-power-level 3, 4, 5 get-power-level 4 set-time-for-power-on 1 get-sensor-state 3, 4 set-time-of-day 1 get-time-of-day 1 start-cpu 4 ibm,configure-connector 7 stop-self 7 ibm,exti2c 1 system-reboot 1 ibm,os-term 1 nvram-store 1 nvram-fetch 1 power-off 1, 6 ibm,power-off-ups

Notes: These RTAS calls function as specified in this architecture, regardless of the power state of any DR entity in the platform (providing the call is implemented). These RTAS calls do not cause errors nor return an error status by accessing hardware which is isolated, unusable and/or powered down. These RTAS calls function properly when dealing with a DR connector, when the parent of that DR connector is powered and configured, regardless of the state of the child of the parent (for set-indicator, the isolation-state and dr-indicator names, and for get-sensor-state, the dr-entity-sense sensor name). The results of the OS issuing these RTAS calls to hardware when the access to that hardware is through hardware which is isolated, unusable, powered off, or incompletely configured, are indeterminate. The results of the OS changing the power or isolation state of a Dynamic Reconfigure connector while there is an uncompleted ibm,configure-connector operation in progress against that connector are indeterminate. Power domains which were defined within sub-trees which have been subsequently isolated may remain un-modified by this call; their state will be platform dependent. The results of the OS issuing these RTAS calls to hardware which is isolated and/or powered off are indeterminate. R1--3. For all DR options: If there is Forth code associated with a DR entity, it must not modify the OF device tree properties or methods unless modifications can be hidden by the ibm,configure-connector RTAS call (that is, where this RTAS routine recognizes the entity and creates the appropriate OF device tree characteristics that would have been created by the Forth code). R1--4. For all DR options: The hardware must protect against any physical damage to components if the DR entity is removed or inserted while power is on at the DR connector. R1--5. For all DR options: During a DR operation (including resetting and removing the reset from the entity, powering up and powering down the entity, unisolating and isolating the entity, and physically inserting and removing the entity), the platform must prevent the introduction of unrecoverable errors on the bus or interconnect into which the DR entity is being inserted or removed. R1--6. For all DR options: During a DR operation (including resetting and removing the reset from the entity, powering up and powering down the entity, unisolating and isolating the entity, and physically inserting and removing the entity), the platform must prevent damage to the DR entity and the planar due to any electrical transitions. R1--7. For all DR options: If there are any live insertion DR entities in a platform and if those entities or the rest of the platform cannot tolerate the power being turned off to those entities during DR operations on other DR entities, then they must not be placed in the same power domain as the DR entities that will be powered off. R1--8. For all DR options: A separate visual indicator must be provided for each physical DR connector which can be used for insertion of a DR Entity or which contains a DR entity that can be removed, and the indicator must be individually controllable via the set-indicator RTAS call, and must have the capability to be set to the states as indicated in and . R1--9. For all DR options: If a platform provides a separate indicator to indicate the state of the power for the DR connector, then that LED must be turned on by the platform when the platform turns the power on to the DR connector and must be turned off by the platform when the platform turns the power off to the DR connector. R1--10. For all DR options: If a DR entity requires power to be turned off prior to the physical removal of the DR entity from the platform, then the hardware must provide a green power indicator to indicate the power state of the DR entity R1--11. For all DR options: The platform must provide any necessary power sequencing between voltages within a power domain during DR operations (for example, during the set-power-level RTAS call). R1--12. For all DR options: If a platform supports DR, then all DR entities must support the full on to off and the off to full on power transitions. Architecture Note: Requirement is necessary so that the OS can count on the availability of certain RTAS facilities and so that the OS does not use other RTAS facilities when they are not available. This may put certain hardware restrictions on what can and cannot be shut down. Hardware Implementation Notes: Requirement requires careful planning of hardware design and platform structure to assure that no resources critical to RTAS are put into power domains that are powered down as part of a DR operation. In addition, the platform is required to provide the facilities (registers and bits in registers readable by firmware, etc.) so that RTAS can query the state of the hardware and determine if something is powered off before actually accessing the powered-off hardware. Requirement indicates that there cannot be any sharing of indicators between DR connectors. In some large systems (for example, systems with many racks of equipment) it may not be possible or convenient to view the individual DR visual indicators without opening cabinet doors, etc. In such cases, the designers of such systems could consider putting a “summary” visual indicator where the user could readily see it, which is basically a logical “or” of the visual indicators which are out of sight. For example, in a rack-based system, the drawers might have an indicator on the front of the drawer that indicates if any indicators on the back of the drawer are flashing. This summary indicator will not be accessed by the software (that is, will be transparent to the software) but it is permissible for the indicator to have firmware dependencies.

For All DR Options - OF Requirements This section describes the OF properties added for DR and any additional requirements placed on OF due to DR. This section defines a number of new DR properties which are arrays. All properties for a specific DR connector under a node are at the same offset into each array. Also, when the descriptive text states “the first connector” this does not imply any physical position or numbering, but rather a logical “first” connector beneath a particular node in the OF device tree.

General Requirements R1--1. For all DR options: When the firmware passes control to the OS, the DR hardware must be initialized such that all of the DR connectors which would return “DR entity present” to a get-sensor-state dr-entity-sense) are fully powered and operational and any DR visual indicators are set to the appropriate state (on or off) as indicated by . R1--2. For all DR options: After the firmware has passed control to the OS, the state of the DR visual indicators must not change except under the following conditions: As directed to do so by the set-indicator RTAS call. Under the condition of a power-fault, in which case the hardware may change the state of the visual indicator to the “off” state if it turns the power off to the slot. R1--3. For all DR options: The platforms which have hierarchical power domains must provide the “power-domains-tree” property in the OF device tree.

<emphasis role="bold"><literal>“ibm,drc-indexes”</literal></emphasis> Property This property is added for the DR option to specify for each DR connector an index to be passed between the OS and RTAS to identify the DR connector to be operated upon. This property is in the parent node of the DR connector to which the property applies. See for the definition of this property. See for additional information. R1--1. For all DR options: For each OF device tree node which supports DR operations on its children, the OF must provide an “ibm,drc-indexes” property for that node.

<emphasis role="bold"><literal>“ibm,my-drc-index”</literal></emphasis> Property This property is added for the DR option to specify for each node which has a DR connector between it and its parent, the value of the entry in the “ibm,drc-indexes” property for that connector. This property is used for correlation purposes. See for the definition of this property. R1--1. For all DR options: For each OF device tree node which has a DR connector between it and its parent, the OF must provide an “ibm,my-drc-index” property for that node.

<emphasis role="bold"><literal>“ibm,drc-names”</literal></emphasis> Property This property is added for the DR option to specify for each DR connector a user-readable location code for the connector. See for the definition of this property. See for additional information. R1--1. For all DR options: For each OF device tree node which supports DR operations on its children, the OF must provide an “ibm,drc-names” property for that node. R1--2. For all DR options: The content of the “ibm,drc-names” property must be of the format defined in . <emphasis role="bold"><literal>“ibm,drc-names”</literal></emphasis> Property Format DRC Type DRC Name 1-8, 11-30 (PCI Hot Plug) Location code SLOT Location code (built-in has port suffix) PORT Port x CPU CPU x where “x” is a decimal number with one or more digits and no leading zeroes MEM or MEM-n LMB x where “x” is a decimal number with one or more digits and no leading zeroes PHB PHB x where “x” is a decimal number with one or more digits and no leading zeroes COPLATFAC location code COPLATFUN location code

<emphasis role="bold"><literal>“ibm,drc-power-domains”</literal></emphasis> Property This property is added for the DR option to specify for each DR connector the power domain in which the connector resides. See for the definition of this property. See for additional information. R1--1. For all DR options: For each OF device tree node which supports DR operations on its children, the OF must provide an “ibm,drc-power-domains” property for that node. Software Implementation Notes: Software will not call the set-power-level RTAS call with an invalid power domain number, and for purposes of this call, a power domain number of -1 (a live insert connector) is considered invalid. For the case where the power domain is -1 (the live insert case), this does not imply that the connector does not need isolating before the DR operation, only that it does not need to be powered off.

<emphasis role="bold"><literal>“ibm,drc-types”</literal></emphasis> Property This property is added for the DR option to specify for each DR connector a user-readable connector type for the connector. See for the definition of this property. See for additional information. Architecture Note: The logical connectors (CPU, MEM etc.) represent DR boundaries that may not have physical DR connectors associated with them. If a physical DR boundaries were present they would be represented by a different DR connector type. It is possible that a given boundary may be represented by both a physical and a logical connector. In that case, logical assignment would be managed with the logical connector and physical add/remove would be managed by specifying the physical DR connector. R1--1. For all DR options: For each OF device tree node which supports DR operations on its children, the OF must provide an “ibm,drc-types” property for that node.

<emphasis role="bold"><literal>“ibm,phandle”</literal></emphasis> Property This property is added for the DR option to specify the phandle for each OF device tree node returned by ibm,configure-connector. See for the definition of this property. R1--1. For all DR options: The ibm,configure-connector RTAS call will include the “ibm,phandle” property in each OF device tree node that it returns. This phandle must be unique and consistent with any phandle visible to an OF client program or any other information returned by ibm,configure-connector.

<emphasis role="bold"><literal>“ibm,drc-info”</literal></emphasis> Property This property is added to consolidate the information provided by the “ibm,drc-indexes”, “ibm,drc-names”, “ibm,drc-types” and “ibm,drc-power-domains” properties. When present, it replaces those properties. R1--1. For each OF device tree node which supports DR operations on its children, OF must provide an “ibm,drc-info” property for that node. R1--2. The “ibm,drc-info” property shall only be present if the Operating System indicates support for this new property definition, otherwise, the “ibm,drc-indexes”, “ibm,drc-names”, “ibm,drc-types” and “ibm,drc-power-domains” will be present. R1--3. Following live partition migration the Operating System must be prepared to support either the “ibm,drc-info” property or the “ibm,drc-indexes”, “ibm,drc-names”, “ibm,drc-types” and “ibm,drc-power-domains” set of properties. The property or properties presented will based on the capability of the target partition.

For All DR Options - RTAS Requirements For platforms that implement DR, there is one new RTAS call and some changes (new requirements) placed on existing ones.

General Requirements The following are the general requirements for RTAS for all DR options. R1--1. For all DR options: If there is Forth code associated with a DR entity and that Forth code would normally modify the OF device tree properties or methods, then if that entity is to be supported as a DR entity on a particular platform, the ibm,configure-connector RTAS call on that platform must recognize that entity and create the appropriate OF device tree characteristics that would have been created by the Forth code.

<emphasis>set-power-level</emphasis> This RTAS call is defined in . Several additional requirements are placed on this call when the platform implements DR along with PM. This RTAS call is used in DR to power up or power down a DR connector, if necessary (that is, if there is a non-zero power domain listed for the DR connector in the “ibm,drc-indexes” or “ibm,drc-info” property). The input is the power domain and the output is the power level that is actually to be set for that domain; for purposes of DR, only two of the current power levels are of interest: “full on” and “off.” For sequencing requirements between this RTAS routine and others, see Requirements and . R1--1. For all DR options: the set-power-level RTAS call must be implemented as specified in and the further requirements of this DR option. R1--2. For all DR options: The set-power-level RTAS call must initiate the operation and return “busy” status for each call until the operation is actually complete. R1--3. For all DR options: If a DR operation involves the user inserting a DR entity, then if the firmware can determine that the inserted entity would cause a system disturbance, then the set-power-level RTAS call must not power up the entity and must return an error status which is unique to that particular type of error, as indicated in . <emphasis>set-power-level</emphasis> Error Status for specific DR options Parameter Type Name Option Name Values Out Status PCI Hot Plug DR option -9000: Powering entity would create change of frequency on the bus and would disturb the operation of other PCI IOAs on the bus, therefore entity not powered up.

Hardware Implementation Notes: For any DR operation, the firmware could optionally not allow powering up of a DR entity, if the powering up would cause a platform over-power condition (the firmware would have to be provided with the DR Entities’ power requirements and the platform’s power capability by a method which is not architected by the DR architecture). If PM is not implemented in the platform, then only the “full on” and “off” states need to be implemented for DR and only those two states will be used. Software Implementation Note: The operation of the set-power-level call is not complete at the time of the return from the call if the “busy” status is returned. If it is necessary to know when the operation is complete, the routine should be called with the same parameters until a non-busy status is returned.

<emphasis>get-sensor-state</emphasis> This RTAS call is defined in . This RTAS call will be used in DR to determine if there is something connected to the DR connector. The “rtas-sensors” and “ibm,sensor-<token>” OF properties are not applicable to DR sensors defined in . R1--1. For all DR options: RTAS must implement the get-sensor-state RTAS call. R1--2. For all DR options: The sensor values specified in must be implemented as specified in that table. <emphasis>get-sensor-state</emphasis> Defined Sensors for All DR Options Sensor Name Token Value Defined Sensor Values Description dr-entity-sense 9003 DR connector empty (0) Returned for physical DR entities if the connector is available (empty) for an add operation. The DR connector must be allocated to the OS to return this value, otherwise a status of -3, no such sensor implemented, will be returned from the get-sensor-state RTAS call. DR entity present (1) Returned for logical and physical DR entities when the DR connector is allocated to the OS and the DR entity is present. For physical DR entities, this indicates that the DR connector actually has a DR entity plugged into it. For DR connectors of physical DR entities, the DR connector must be allocated to the OS to return this value, otherwise a status of -3, no such sensor implemented, will be returned from the get-sensor-state RTAS call. For DR connectors of logical DR entities, the DR connector must be allocated to the OS to return this value, otherwise a sensor value of 2 or 3 will be returned. DR entity unusable (2) Returned for logical DR entities when the DR entity is not currently available to the OS, but may possibly be made available to the OS by calling set-indicator with the allocation-state indicator, setting that indicator to usable. DR entity available for exchange (3) Returned for logical DR entities when the DR entity is available for exchange in a sparing type operation, in which case the OS can claim that resource by doing a set-indicator RTAS call with allocation-state set to exchange. DR entity available for recovery (4) Returned for logical DR entities when the DR entity can be recovered by the platform and used by the partition performing a set-indicator RTAS call with allocation-state set to recover.

R1--3. For all DR options except the PCI Hot Plug and LRDR options: If the get-sensor-state call with the dr-entity-sense sensor requires the DR entity to be powered up and/or unisolated to sense the presence of the DR entity, then the get-sensor-state call must return the error code of -9000 or -9001, as defined in , if the DR entity is powered down or is isolated when the call is made. <emphasis>get-sensor-state</emphasis> Error Status for All DR Options Parameter Type Name Values Out Status -9000: Need DR entity to be powered up and unisolated before RTAS call -9001: Need DR entity to be powered up, but not unisolated, before RTAS call -9002: (see architecture note, directly below)

Architecture Note: The -9002 return code should not be implemented. For legacy implementations if it is returned, then it should be treated by the caller the same as a return value of 2 (DR entity unusable). R1--4. For all DR options: The value used for the sensor-index input to the get-sensor-state RTAS call for the sensors in must be the index for the connector, as passed in the “ibm,drc-indexes” or “ibm,drc-info” property. Hardware and Software Implementation Note: The status introduced in Requirement is not valid for get-sensor-state calls when trying to sense insertion status for PCI slots (see Requirement ). Architecture Note: DR entity available for recovery state is intended to allow a platform to temporary allocate to itself resources on a reboot and then allow the OS to subsequently recover those resources when no longer needed by the platform. An example of use would be the platform temporarily reserving some LMBs to itself during a reboot to store dump data, and then making the LMBs available to a OS partition by marking them with the state of “available for recovery” after the dump data has been transferred to the OS.

<emphasis>set-indicator</emphasis> This RTAS call is defined as shown in . This RTAS call is used in DR to transition between isolation states, allocation states, and control DR indicators. In some cases, a state transition fails due to various conditions, however, a null transition (commanding that the new state be what it already is) always succeeds. As a consequence, this RTAS call is used in all DR sequences to logically (and if necessary physically) isolate and unisolate the connection between a DR entity and the platform. If physical isolation is indeed required for the DR entity, this RTAS call determines the necessity for isolation, not the calling program. The set-indicator allocation-state and set-indicator isolation-state are linked. Before calling set-indicator with isolation-state set to unisolate, the DR entity being unisolated will first need to be allocated to the OS. If the get-sensor-state call would return a value of DR entity unusable or if it would return an error like -3 for the DR entity, then the set-indicator isolation-state to unisolate would fail for that DR entity. For sequencing requirements between this RTAS routine and others, see Requirements and . A single set-indicator operation for indicator type 9001 may require an extended period of time for execution. Following the initiation of the hardware operation, if the set-indicator call returns prior to successful completion of the operation, the call will return either a status code of -2 or 990x. A status code of -2 indicates that RTAS may be capable of doing useful processing immediately. A status code of 990x indicates that the platform requires an extended period of time, and hints at how much time will be required before completion status can be obtained. Neither the 990x nor the -2 status codes imply that the platform has initiated the operation, but it is expected that the 990x status would only be used if the operation had been initiated. The following are the requirements for the base DR option. Other DR options may put additional requirements on this RTAS call. indicates which DR indicators are used with which DR connector types. The “rtas-indicators” and “ibm,indicator-<token>” OF properties are not applicable to DR indicators defined in . R1--1. For all DR options: The indicator state values specified in must be implemented as specified in that table. <emphasis>set-indicator</emphasis> Defined Indicators for all DR Options Indicator Name Token Value Defined State Values Default Value Examples/Comments isolation-state 9001 Isolate (0), Unisolate (1) Unisolated This indicator must be implemented for DR connectors for both physical and logical DR entities. Isolate refers to the DR action to logically disconnect the DR entity from the platform. An isolate operation makes the DR entity available to the firmware, and in the case of a physical DR entity like a PCI IOA, logically disconnects the DR entity from the platform (for example, from the PCI bus). Unisolate refers to the DR action to logically connect the entity. Before set-indicator isolation-state to unisolate, the DR entity being unisolated must first be allocated to the OS. If the get-sensor-state call with the dr-entity-sense token would return a value of DR entity unusable or if it would return an error like -3 for the DR entity, then the set-indicator isolation-state to unisolate must fail for that DR entity. dr-indicator 9002 Inactive (0), Active (1), Identify (2) Action (3) 0 if Inactive 1 if Active This indicator must be implemented for DR connectors for physical DR entities. If the DR indicators exist for the DR connector, then they are used to indicate the state of the DR connector to the user. Usage of these states are as defined in and . allocation-state 9003 unusable (0) usable (1) exchange (2) recover (3) NA This indicator must be implemented for DR connectors for logical DR entities. Used to allocate and deallocate entities to the OS. The initial allocation state of a connector is established based upon the initial allocation of resources to the OS image. Subsequently, an OS may request a change of allocation state by use of the set-indicator with allocation-state token. If the transition to the usable state is not possible the -3 (no such indicator implemented) status is returned.

R1--2. For all DR options: The value used for the indicator-index input to the set-indicator RTAS call for the indicators in must be the index for the connector, as passed in the “ibm,drc-indexes” or “ibm,drc-info” property. R1--3. For all DR options: The set-indicator call must return a -2 status, or optionally for indicator type 9001 the 990x status, for each call until the operation is complete; where the 990x status is defined in . R1--4. For all DR options: If this is a DR operation that involves the user inserting a DR entity, then if the firmware can determine that the inserted entity would cause a system disturbance, then the set-indicator RTAS call must not unisolate the entity and must return an error status which is unique to the particular error. R1--5. For all DR options: If the set-indicator index refers to a connector that would return a “DR entity unusable” status (2) to the get-sensor dr-entity-sense token, the set-indicator RTAS return code must be “No such indicator implemented” (-3), except in response to a successful set-indicator allocation state usable. R1--6. For all DR options combined with the LPAR option: The RTAS set-indicator specifying unusable allocation-state of a DR connector must unmap the resource from the partition’s Page Frame Table(s) and, as appropriate, its Translation Control Entry tables. R1--7. For all DR options combined with the LPAR option: The successful completion of the RTAS set-indicator specifying usable allocation-state of a DR connector must allow subsequent mapping of the resource as appropriate within the partition’s Page Frame Table(s) and/or its Translation Control Entry tables. Software Implementation Note: The operation of the set-indicator call is not complete at the time of the return from the call if the “busy” status is returned. If it is necessary to know when the operation is complete, the routine should be called with the same parameters until a non-busy status is returned. Hardware and Software Implementation Note: The set-indicator (isolation-state) call is used to clear RTAS internal tables regarding this device. The ibm,configure-connector RTAS routine will need to be called before using the entities below this connector, even if power was never removed from an entity while it was in the isolated state.

<emphasis>ibm,configure-connector</emphasis> RTAS Call The RTAS function ibm,configure-connector is a new RTAS call introduced by DR and is used to configure a DR entity after it has been added by either an add or replace operation. It is expected that the ibm,configure-connector RTAS routine will have to be called several times to complete the configuration of a dynamic reconfiguration connector, due to the time required to complete the entire configuration process. The work area contains the intermediate state that RTAS needs to retain between calls. The work area consists of 4096 byte pages of real storage on 4096 byte boundaries which can be increased by one page on each call. The OS may interleave calls to ibm,configure-connector for different dynamic reconfiguration connectors, however, a separate work area will be associated with each dynamic reconfiguration connector which is actively being configured. Other standard RTAS locking rules apply. The properties generated by the ibm,configure-connector call are dependent on the type of DR entities. For a list of properties generated, see the RTAS Requirements section for each specific DR option. For example, for a list of properties generated for PCI Hot Plug, see . For sequencing requirements between this RTAS routine and others, see Requirement . R1--1. For all DR options: The RTAS function ibm,configure-connector must be implemented and must implement the argument call buffer defined by . <emphasis>ibm,configure-connector</emphasis> Argument Call Buffer Parameter Type Name Values In Token Token for ibm,configure-connector Number Inputs 2 Number Outputs 1 Work area Address of work area Memory extent 0 or address of additional page Out Status -9003: Cannot configure - Logical DR connector unusable, available for exchange, or available for recovery. -9002: Cannot configure - DR Entity cannot be supported in this connector -9001 Cannot configure - DR Entity cannot be supported in this system -2: Call again -1: Hardware error 0: Configuration complete 1: Next sibling 2: Next child 3: Next property 4: Previous parent 5: Need more memory 990X: Extended Delay

R1--2. For all DR options: On the first call of a dynamic reconfiguration sequence, the one page work area must be initialized by the OS as in . Initial Work Area Initialization Entry Offset Value 0 entry from the “ibm,drc-indexes” or “ibm,drc-info” property for the connector to configure 1 0

Architecture Note: The entry offset in is either four bytes or eight bytes depending on whether RTAS was instantiated in 32-bit or 64-bit mode, respectively. R1--3. For all DR options: On all subsequent calls of the sequence, the work area must be returned unmodified from its state at the last return from RTAS. R1--4. For all DR options: The ibm,configure-connector RTAS call must update any necessary RTAS configuration state based upon the configuration changes effected through the specified DR connector. The sequence ends when either RTAS returns a “hardware error” or “configuration complete” status code, at which time the contents of the work area are undefined. If the OS no longer wishes to continue configuring the connector, the OS may recycle the work area and never recall RTAS with that work area. Unless the sequence ends with Configuration Complete, the OS will assume that any reported devices remain unconfigured and unusable. RTAS internal data structures (outside of the work area) are not updated until the call which returns “configuration complete” status. A subsequent sequence of calls to ibm,configure-connector with the same entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of devices which were not completely configured. If the index from “ibm,drc-indexes” or “ibm,drc-info” refers to a connector that would return an “DR entity unusable” status (2) to the get-sensor RTAS call with dr-entity-sense token, the ibm,configure-connector RTAS call for that index immediately returns “-9003: Cannot configure - Logical DR connector unusable” on the first call without any configuration action taken on the DR connector. A dynamic reconfiguration connector may attach several sibling OF device tree architected devices. Each such device may be the parent of one or more device sub-trees. The ibm,configure-connector RTAS routine configures and reports the entire sub-tree of devices rooted in previously unconfigured architected devices found below the connector whose index is specified in the first entry of the work area, except those that are associated with an empty or unowned dynamic reconfiguration connector; where unowned refers to a DR connector that would return a DR entity unusable, a DR entity available for exchange, or a DR entity available for entity available for recovery value, for a get-sensor dr-entity-sense sensor. Configuration proceeds in a depth first order. If the ibm,configure-connector RTAS routine returns with the “call again” or 990x status, configuration is proceeding but had to be suspended to maintain the short execution time requirement of RTAS routines. No results are available. The OS should call the ibm,configure-connector RTAS routine passing back the work area unmodified at a later time to continue the configuration process. If the ibm,configure-connector RTAS routine returns with a “Cannot configure - DR Entity cannot be supported in this connector”, then there is a lack of one or more resources at this connector for this DR Entity and there is at least one DR connector in the system into which this DR Entity can be configured. In this case, the DR program should indicate to the user that they need to consult the appropriate system documentation relative to the DR Entity that they are trying to insert into the system. The “need more memory” status code, is similar in semantics to the “call again” status. However, on the next ibm,configure-connector call, the OS will supply, via the Memory extent parameter, the address of another page of memory for RTAS to add to the work area in order for configuration to continue. On all other calls to ibm,configure-connector the contents of the Memory extent parameter should be 0. It is the responsibility of the OS to recover all work area memory after a sequence of ibm,configure-connector calls is completed. Software Implementation Note: The OS may allocate the work area from contiguous virtual space and pass individual discontiguous real pages to ibm,configure-connector as needed. If the ibm,configure-connector RTAS routine returns either the “next sibling” or “next child” status codes, configuration has detected an architected OF device tree device, and is returning its OF device tree node-name. Work Area offset 2 contains an offset within the first page of the work area to a NULL terminated string containing the node-name. Note, if the caller needs to preserve this or any other returned parameters between the various calls of a configuration sequence it will copy the value to its own area. Also, the first call returning configuration data will have a “next child” status code. The “next property” status code indicates that a subsequent property is being returned for the device. Work Area entry offset 2 contains an offset within the first page of the work area to a NULL terminated string containing the property name. Work Area entry offset 3 contains the length of the property value in bytes. Work Area entry offset 4 contains an offset within the first page of the work area to the value of the property. Architecture Note: The ibm,configure-connector RTAS routine returns those applicable properties that can be determined without interpreting any FCode ROM which is associated with the IOA. Additionally, it is permissible for this RTAS call to be aware of various specific IOAs and emulate the action of any FCode associated with the IOA. If the ibm,configure-connector RTAS routine returns the “previous parent” status code, it has come to the end of the string of siblings, and will back up the tree one level following its depth first order algorithm. The 2nd through 4th work area entries are undefined for this status code. Software Implementation Notes: Any attempts to configure an already configured connector or one in progress of being configured will produce unpredictable results. The software will put the DR entity in the full on power state before issuing the ibm,configure-connector RTAS call to configure the DR entity.

For All DR Options - OS Requirements

Visual Indicator States DR Visual indicator usage will be as indicated in the following requirement, in order to provide for a consistent user interface across platforms. Information on implementation dependent aspects of the DR indicators can be found in . R1--1. For all DR options: The visual indicators must be used as defined in . Visual Indicator Usage State of indicator Usage Inactive The DR connector is inactive and entity may be removed or added without system disruption. For DR entities that require power off at the connector, then the caller of set-indicator must turn power off prior to setting the indicator to this state. See also . Identify (Locate) This indicator state is used to allow the user to identify the physical location of the DR connector. This state may map to the same visual state (for example, blink rate) as the Action state, or may map to a different state. See also . Action Used to indicate to the user the DR connector on which the user is to perform the current DR operation. This state may map to the same visual state (for example, blink rate) as the Identify state, or may map to a different state. See also . Active The DR connector is active and entity removal may disrupt system operation. See also .

Other Requirements R1--1. For all DR options: The OS must detect hierarchical power domains (as specified in the “power-domains-tree” property) and must handle those properly during a DR operation. R1--2. For all DR options: When bringing a DR entity online, the OS must issue the following RTAS calls in the following order: If the power domain is not 0, then call set-power-level set-indicator (with the isolation-state token and a state value of unisolate) ibm,configure-connector R1--3. For all DR options: When taking a DR entity offline, the OS must issue the following RTAS calls in the following order: set-indicator with the isolation-state token and a state value of isolate) If the power domain is not 0, then call set-power-level R1--4. When bringing a DR entity online that utilizes TCEs (see ), the OS must initialize the DR entity's TCEs.

PCI Hot Plug DR Option This section will develop the requirements over and beyond the base DR option requirements, that are unique to being able to perform DR operations on PCI plug-in cards that do not share power domains with other PCI plug-in cards.

PCI Hot Plug DR - Platform Requirements A method will be provided to isolate the plug-in card (power and logic signals) and to physically remove the plug-in card from the machine. The physical removal may pose an interesting mechanical challenge, due to the position of the card edge connector relative to the desired direction of insertion of the card from the outside of the machine. In addition, PCI plug-in cards may have internal cables and may span multiple slots. Such mechanical issues are not addressed by this architecture. This section describes the requirements for the platform when a platform implements the PCI Hot Plug DR option. R1--1. For the PCI Hot Plug DR option: All platform requirements of the base DR option architecture must be met ( ). R1--2. For the PCI Hot Plug DR option: All PCI requirements must be met (for example, timing rules, power slew rates, etc.) as specified in the appropriate PCI specifications, and in the . R1--3. For the PCI Hot Plug DR option: The hardware must provide two indicators per PCI Hot Plug slot, and all the following must be true: One indicator must be green and the platform must use the indicator to indicate the power state of the PCI Hot Plug slot, turning on the indicator when the slot power is turned on and turning off the indicator when the slot power is turned off. The other indicator must be amber and must be controllable by RTAS, separately from all other indicators, and must be used as a slot Identify indicator, as defined in . R1--4. For the PCI Hot Plug DR option: The hardware must provide a separate power domain for each PCI Hot Plug slot, controllable by RTAS, and that power domain must not be used by any other DR connector in the platform. R1--5. For the PCI Hot Plug DR option: The hardware must provide the capability to RTAS to be able to read the insertion state of each PCI Hot Plug slot individually and must provide the capability of reading this information independent of the power and isolation status of the plug-in card. R1--6. For the PCI Hot Plug DR option: The hardware must provide individually controllable electrical isolation (disconnect) from the PCI bus for each PCI Hot Plug slot, controllable by RTAS and this isolation when set to the isolation mode must protect against errors being introduced on the bus, and damage to the plug-in cards or planars during the plug-in card power up, power down, insertion, and removal. R1--7. For the PCI Hot Plug option: A platform must prevent the change in frequency of a bus segment (for example, on the insertion or removal of an plug-in card) while that change of frequency would result in improper operation of the system. R1--8. For the PCI Hot Plug option: For each PCI Hot Plug slot which will accept only 32-bit (data width) plug-in cards, the platform must: Accommodate plug-in cards requiring up to 64 MB of PCI Memory Space and 64 KB of PCI I/O space For TCE-mapped DMA address space, must provide the capability to map simultaneously and at all times at least 128 MB of PCI Memory space for the slot. R1--9. For the PCI Hot Plug option: Each PCI Hot Plug slot which will accept 64-bit (data width) plug-in cards, the platform must: Accommodate plug-in cards requiring up to 128 MB of PCI Memory Space and 64 KB of PCI I/O space For TCE-mapped DMA address space, must provide the capability to map simultaneously and at all times at least 256 MB of PCI Memory space for the slot. R1--10. For the PCI Hot Plug option with PCI Express: The power and isolation controls must be implemented by use of the PCI Standard Hot-Plug Controller (see ). R1--11. For the PCI Hot Plug option with PCI Express: If a PCI Hot Plug DRC contains multiple PEs, then that DRC must be owned by the platform or a trusted platform agent. Hardware implementation Notes: Surge current protection on the planar is one way to provide the required protection against damage to components if an entity is removed from or inserted into a connector with the power still applied to the connector. Removal of an entity without the proper quiescing operation may result in a system crash. In order for hot plugging of PCI plug-in cards with the system operational to be useful, a mechanical means is needed in order to be able to remove or insert PCI plug-in cards without shutting off system power and without removing the covers above the plug-in cards (which in general, would require powering-down the system). It is recommended that the control of the indicators required by Requirement be via the PCI Standard Hot Plug Controller (see ).

PCI Hot Plug DR - Boot Time Firmware Requirements R1--1. For the PCI Hot Plug DR option: All OF requirements of the base DR option architecture must be met ( ). R1--2. For the PCI Hot Plug DR option: The OF must only generate the “clock-frequency” OF property for PCI bridge nodes which cannot change bus clock frequency during a PCI Hot Plug operation. R1--3. For the PCI Hot Plug DR option: The OF must set the PCI configuration register bits and fields appropriately. Hardware Implementation Note: The OF should leave sufficient gaps in the bus numbers when configuring bridges and switches such that plug-in cards with bridges and switches which are to be supported by the platform’s DR operations can be plugged into every slot in the platform in which those plug-in cards are supported. That is, insertion of an plug-in card that contains a bridge or switch into a platform, requires that there be sufficient available bus numbers allocated to that PCI bus such that new bus numbers can be assigned to the buses generated by the bridges and switches on the plug-in cards.

PCI Hot Plug DR - Run Time Firmware Requirements R1--1. For the PCI Hot Plug DR option: All RTAS requirements of the base DR option architecture must be met ( ). R1--2. For the PCI Hot Plug DR option: The set-indicator RTAS call with a indicator type of isolation-state and a state value of unisolate (1) must not return a “success” status until any IOA on a plug-in card inserted into the PCI slot is ready to accept configuration cycles, and must return a “success” status if the PCI slot is empty. R1--3. For the PCI Hot Plug DR option: The ibm,configure-connector RTAS call must initialize the PCI configuration registers and platform to the same values as at boot time. Architecture Note: During a DR replace operation, the replacement PCI IOA may not get placed back at the same addresses, etc., as the original DR entity by the firmware (although it has to be placed back into the same DR connector, or it is not a DR replace operation). On a replace operation, the configuration information cannot reliably be read from the IOA being replaced (the IOA might be broken), so the firmware cannot read the configuration information from the old IOA and replace the configuration information into the new IOA. PCI I/O sub-systems architecturally consist of two classes of devices, bus bridges (Processor Host Bridges (PHBs), PCI to PCI Bridges, and PCI Express switches and bridges) and IOAs. The support that ibm,configure-connector provides for these two classes is different. For Bus Bridges, firmware will totally configure the bridge so that it can probe down the depth of the tree. For this reason, the firmware must include support for all bridges the platform supports. This includes interrupt controllers as well as miscellaneous unarchitected devices that do not appear in the OF device tree. The properties supported and reported are the same as provided by the boot time firmware. For PCI plug-in cards, the support is significantly less; it is essentially the functionality specified in section 2.5 FCode Evaluation Semantics of the . However, the configuration proceeds as if all devices do not have an expansion ROM since the RTAS code does not attempt to determine if an FCode ROM is present nor attempts to execute it. This may, in some cases, generate different device node properties, values and methods than would happen had the IOA been configured during boot. If the IOA’s device driver or configuration support cannot deal with such differences, then the IOA is not dynamically reconfigurable. The other properties generated are dependent upon the IOA’s configuration header from the following list. If the property is not on this list the reader should assume that RTAS ibm,configure-connector will not generate it. shows what PCI OF properties can be expected to be returned from the ibm,configure-connector call for PCI Hot Plug operations and shows some which can be expected to not be returned. R1--4. For the PCI Hot Plug DR option: The ibm,configure-connector RTAS call when used for PCI IOAs must return the properties named in except as indicated in the Present?/Source column. PCI Property Names which will be Generated by <emphasis>ibm,configure-connector</emphasis> Property Name Present?/Source “name” Always present. “vendor-id” Always present. From PCI header. “device-id” Always present. From PCI header. “revision-id” Always present. From PCI header. “class-code” Always present. From PCI header. “interrupts” Only present if Interrupt Pin register not 0. “min-grant” Present unless Header Type is 0x01. “max-latency” Present unless Header Type is 0x01. “devsel-speed” Only present for conventional PCI and PCI-X. “compatible” Always present. Constructed from the PCI header information for the IOA or bridge. “fast-back-to-back” Only present for conventional PCI and PCI-X when Status Register bit 7 is set. “subsystem-id” Only present if “Subsystem ID” register not 0. “subsystem-vendor-id” Only present if “Subsystem vendor ID” register not 0. “66mhz-capable” Only present for conventional PCI and PCI-X when Status Register bit 5 is set. “133mhz-capable” Only present for PCI-X when PCI-X Status Register bit 17 is set. “266mhz-capable” Only present for PCI-X when PCI-X Status Register bit 30 is set. “533mhz-capable” Only present for PCI-X when PCI-X Status Register bit 31 is set. “reg” Always present. Specifies address requirements. “assigned-addresses” Always present. Specifies address assignment. “ibm,loc-code” Always present. RTAS will have to remember the location codes associated with all DR connectors so that it can build this property. “ibm,my-drc-index” Always present. “ibm,vpd” Always present for sub-systems and for PCI IOAs which follow the PCI VPD proposed standard. See and note to see the effect of using different PCI versions. “device_type” For bridges, always present with a value of “PCI” otherwise not present. “ibm,req#msi” Present for all PCI Express IOA nodes which are requesting MSI support, when the platform supports MSIs.

is a non-exhaustive list of common properties that may not be generated by RTAS ibm,configure connector for a PCI IOA. Also, the concept of a phandle does not apply to nodes reported by ibm,configure-connector. Non-exhaustive list of PCI properties that may not be generated by <emphasis>ibm,configure connector</emphasis> Property Name Present?/Source “ibm,connector-type” Never present -- only for built-in entries not for pluggable ones. “ibm,wrap-plug-pn” Never present -- only for built-in entries not for pluggable ones. “alternate-reg” Never present -- needs FCode. “fcode-rom-offset” Never present -- RTAS does not look for this. “wide” Never present -- needs FCode. “model” Never present -- needs FCode. “supported-network-types” Never present -- needs FCode. “address-bits” Never present -- needs FCode. “max-frame-size” Never present -- needs FCode. “local-mac-address” Never present -- needs FCode. “mac-address” Never present -- needs FCode. “built-in” Not present for a PCI Hot Plug connectors.

Architecture Note: Without “device_type” and other properties, the OS cannot append an IOA added via DR to the boot list for use during the next boot. R1--5. For the PCI Hot Plug option: When ibm,configure-connector RTAS call returns to the caller, if the device driver(s) for any IOA(s) configured as part of the call are EEH unaware (that is may produce data integrity exposures due to an EEH stopped state) or if they may be EEH unaware, then the ibm,configure-connector call must disable EEH prior to returning to the caller. Software Implementation Note: To be EEH aware, a device driver does not need to be able to recover from an EEH stopped state, only recognize the all-1's condition and not use data from operations that may have occurred since the last all-1's checkpoint. In addition, the device driver under such failure circumstances needs to turn off interrupts (using the ibm,set-int-off RTAS call) in order to make sure that any (unserviceable) interrupts from the IOA do not affect the system. Note that this is the same device driver support needed to protect against an IOA dying or against a no-DEVSEL type error (which may or may not be the result of an IOA that has died). Note that if all-1’s data may be valid, the ibm,read-slot-reset-state2 RTAS call should be used to discover the true EEH state of the device.

PCI Hot Plug DR - OS Requirements R1--1. For the PCI Hot Plug DR option: All OS requirements of the base DR option architecture must be met ( ).

Logical Resource Dynamic Reconfiguration (LRDR) The Logical Resource Dynamic Reconfiguration option allows a platform to make available and recover platform resources such as CPUs, Memory Regions, Processor Host Bridges, and I/O slots to/from its operating OS image(s). The Logical Resource Dynamic Reconfiguration option provides the means for providing capacity on demand to the running OS and provides the capability for the platform to make available spare parts (for example, CPUs) to replace failing ones (called sparing operations). Combined with the LPAR option, platforms can move resources between partitions without rebooting the partitions’ OS images. The Logical Resource Dynamic Reconfiguration (LRDR) option deals with logical rather than physical resources. These logical resources are already physically installed (dynamic installation/removal of these resources, if supported, is managed via the Hardware Management Console (HMC) or Service Focal Point (SFP)). As such, the OS does not manage either connector power or DR visual indicators. Logical connector power domains are specified as “hot pluggable” (value -1) and DR visual indicators are not defined for logical connectors. The device tree contains logical resource DR connectors for the maximum number of resources that the platform can allocate to the specific OS. In some cases such as for processors and PHBs, this may be the maximum number of these resources that the platform supports even if there are fewer than that currently installed. In other cases, such as memory regions in a LPARed system, the number may be limited to the amount of memory that can be supported without resizing the cpu page frame table. The OS may use the get-sensor-state RTAS call with the dr-entity-sense token to determine if a given drc-index refers to a connector that is currently usable for DR operations. If the connector is not currently usable the return state is “DR entity unusable” (2). A set-indicator (isolation state) RTAS call to an unusable connector or (dr-indicator) to any logical resource connector results in a “No such indicator implemented” return status. Two allocation models are supported. In the first, resources are specifically assigned to one and only one partition at a time by the HMC. In this model, a DR entity state is changed from unusable to usable only by firmware in response to HMC requests to explicitly move the allocation of the resource between partitions. In the second model, certain resources may “float” between cooperating partitions, a partition issues a set-indicator (allocation state usable) RTAS call and if the resource is free, the firmware assigns the resource to the requesting partition and returns the success status. Set-indicator returns the code “no-such-indicator” if either the resource is not free, or the platform is operating in the first model. To return a resource to the platform firmware, the OS issues a set-indicator (allocation state unusable) RTAS call for the resource’s DR connector.

Platform Requirements for LRDR The following requirements apply to the hardware and/or firmware as a result of implementing LRDR on a platform. R1--1. For the LRDR option: The hardware must provide the capability to power-cycle any hardware that is going to be switched between partitions as part of LRDR, if that hardware requires power-cycling to put the hardware into a known state (for example, PCI IOAs). Architecture Note: Except for PCI Express IOAs that implement the Function Level Reset (FLR) option, since the PCI architecture is not specific as to the state of the IOA when the IOAs reset is activated and deactivated, either the platform designer will need to guarantee that all logic in all IOAs (including any internal storage associated with the IOA) is cleared to a known state by use of the IOAs' reset, or else the platform will need to provide the capability to power-cycle those IOAs, including the integrated ones (that is, including the non-pluggable ones). Also note that hardware which requires power-cycling to initialize may impact the capability to reliably reboot an OS, independent of whether or not LRDR is implemented. R1--2. For the LRDR option: Any power-cycling of the hardware which is done by the platform during an LRDR operation (for example, as part of an ibm,configure-connector operation), must be functionally transparent to the software, except that PCI plug-in cards that are plugged into a PCI Hot Plug DR connector do not need to be powered on before the ibm,configure-connector call for a logical SLOT DR connector returns to the caller. Architecture Note: PCI plug-in cards that are plugged into a DR connector will not be configured as part of an ibm,configure-connector operation on a logical DR connector of type SLOT above the plug-in card (see section 17.6.3.3 ibm,configure-connector). However, Requirement does require a PCI IOA which is not plugged in to a PCI Hot Plug DR connector (for example, soldered on the planar) be powered up and configured as a result of an ibm,configure-connector operation on a logical DR connector of type SLOT above such an IOA, and requires this powering up to be functionally transparent to the caller of ibm,configure-connector operation (a longer busy time is not considered to be a violation of the functional transparency requirement).

DR Properties for Logical Resources Logical resource dynamic reconfiguration is a special case of general DR, therefore, certain DR properties take on special values. DR Property Values for Logical Resources Property Name Property Value “ibm,drc-indexes” As defined in . “ibm,my-drc-index” As defined in . “ibm,drc-names” As defined in . Note: This name allows for correlation between the OS and HMC user interfaces. “ibm,drc-power-domains” Logical Resource connectors are defined to be “hot pluggable” having a domain value of -1 per definition in . “ibm,drc-types” Shall be one of the values “CPU”, “MEM”, “PHB”, or “SLOT” as defined in . “ibm,drc-info” As defined in .

R1--1. For the LRDR option: All platform requirements of the base DR option architecture must be met ( ). R1--2. For the LRDR option: The /cpus OF device tree node must include either the “ibm,drc-info” property or the following four properties: “ibm,drc-types”, “ibm,drc-names”, “ibm,drc-indexes” and “ibm,drc-power-domains”. The drc-type must be type CPU, and the drc-power-domain must have the value -1. The property or properties must contain entries for each potentially supported dynamically reconfigurable processor. R1--3. For the LRDR option: The root node of the OF device tree must include ither the “ibm,drc-info” property or the following four properties: “ibm,drc-indexes”, “ibm,drc-names”, “ibm,drc-types” and “ibm,drc-power-domains”. The drc-type must be type MEM and the drc-power-domain must have the value -1. The property or properties must contain entries for each potentially supported dynamically reconfigurable memory region. R1--4. For the LRDR option: The root node of the OF device tree must not include any drc properties ( “ibm,drc-*”) for the base memory region (reg value 0). R1--5. For the LRDR option: The root node of the OF device tree must include either the “ibm,drc-info” property or the following four properties: “ibm,drc-indexes”, “ibm,drc-names”, “ibm,drc-types” and “ibm,drc-power-domains”. The drc-type must be type PHB, and the drc-power-domain must have the value -1. The property or properties must contain entries for each potentially supported dynamically reconfigurable PHB. R1--6. For the LRDR option: The /pci OF device tree node representing a PHB must include either the “ibm,drc-info” property or the following four properties: “ibm,drc-indexes”, “ibm,drc-names”, “ibm,drc-types” and “ibm,drc-power-domains”. The drc-type must be type SLOT, and the drc-power-domain must have the value -1. The property or properties must contain entries for each potentially supported dynamically reconfigurable PCI SLOT. R1--7. For the LRDR option: platforms must implement the allocation-state indicator 9003, as defined in . R1--8. For the LRDR option: For memory LRDR, the “ibm,lrdr-capacity” property must be included in the /rtas node of the partition device tree (see ).

Architectural Intent -- Logical DR Sequences: This architecture is designed to support the logical DR sequences specified in the following sections. See also .

Acquire Logical Resource from Resource Pool The OS responds to some stimuli (command, workload manager, HMC, etc.) to acquire the resource, perhaps using the “ibm,drc-names” value as a reference if a human interface is involved. The OS determines if the resource is usable: OS uses get-sensor-state (dr-entity-sense) to determine the state of the DR connector If the state is “unusable” the OS issues set-indicator (allocation-state, usable) to attempt to allocate the resource. Similarly, if the state is “available for exchange” the OS issues set-indicator (allocation-state, exchange) to attempt to allocate the resource, and if the state is “available for recovery” the OS issues set-indicator (allocation-state, recover) to attempt to allocate the resource. If successful, continue, else return error status to the requester. If successful, this is the point where the resource is allocated to the OS. Continue with DR operation. The OS unisolates the resource via set-indicator (isolation-state, unisolate). This is the point where the OS takes ownership of the resource from the platform firmware and the firmware removes the resource from its resource pool. The OS configures the resource using ibm,configure-connector RTAS. The OS incorporates the resource into its resource pool. If the resource is a processor, the OS must use the start-cpu RTAS call to move the processor from the stopped state (at the end of the ibm,configure-connector) to the running state. The OS returns status of operation to the requester. The OS notifies requesting entity of the OS state relative to the resource acquisition.

Release Logical Resource Some entity (System administrator commanding from the HMC, a workload manager, etc.) requests the OS to release the resource using the “ibm,drc-names” value as a reference. The OS attempts to stop using logical resource. If the resource is a processor, the OS calls the stop-self RTAS call then waits for the processor to enter the stopped state using the RTAS query-cpu-stopped-state call. The OS isolates the resource via set-indicator (isolation-state, isolate). Unless the isolated resource was the partition’s last processor, the OS deallocates the resource via set-indicator (allocation-state, unusable). This is the point where the platform firmware takes ownership of the resource from the OS. That is, the OS removes the resource from its resource pool and the firmware adds it to the firmware resource pool. The OS returns status of operation to the requester. The OS unallocates the resource by set-indicator (allocation-state, unusable). The system administrator may command the HMC to allocate the logical resource to another partition (LPAR) or reserved pool (COD). Any needed hardware removal is handled by HMC/SPC.

RTAS Call Semantics/Restrictions This section describes the unique application of DR RTAS functions to the dynamic reconfiguration of logical resources.

<emphasis>set-indicator</emphasis> (isolation-state, isolate) Dynamic reconfiguration of logical resources introduces special meaning and restrictions to the DR connector isolation function depending upon the logical resource being isolated.

Isolation of CPUs The isolation of a CPU, in all cases, is preceded by the stop-self RTAS function for all processor threads, and the OS insures that all the CPU’s threads are in the RTAS stopped state prior to isolating the CPU. Isolation of a processor that is not stopped produces unpredictable results. The stopping of the last processor thread of a LPAR partition effectively kills the partition, and at that point, ownership of all partition resources reverts to the platform firmware. R1--1. For the LRDR option: Prior to issuing the RTAS set-indicator specifying isolate isolation-state of a CPU DR connector type, all the CPU threads must be in the RTAS stopped state. R1--2. For the LRDR option: Stopping of the last processor thread of a LPAR partition with the stop-self RTAS function, must kill the partition, with ownership of all partition resources reverting to the platform firmware.

Isolation of MEM Regions Isolation of a MEM region creates a paradox if the MEM region being isolated contains the calling program (there being no program left for the firmware to return). Note: The base memory region (starting at address zero) is not associated with a MEM DR connector. This means that the base memory region cannot be isolated. This restriction avoids two fatal conditions, attempts to isolate the region containing RTAS, and attempts to isolate the region containing the interrupt vectors. It is the responsibility of the OS to unmap the addresses of the MEM region being isolated from both PFT and the TCE tables. When the LRDR option is combined with the LPAR option, the hypervisor ensures that the addresses of the MEM region being isolated are unmapped from both the PFT and TCE tables before successfully completing the isolation of the MEM region. If any valid mappings are found, the RTAS set-indicator (isolation-state) does not change the isolation-state and returns with a Status-9001 (Valid outstanding translation). R1--1. For the LRDR option: The caller of the RTAS set-indicator specifying isolate isolation-state of a MEM DR connector type must not be within the region being isolated. R1--2. For the LRDR option combined with the LPAR option: The RTAS set-indicator specifying isolate isolation-state of a MEM DR connector type must check that the region is unmapped from both the partition’s Page Frame Table(s) and any Translation Control Entries that would reference the memory, else the RTAS routine must return with a status of Status-9001 (Valid outstanding translation) and the isolation-state is not changed. Implementation Note: The algorithm chosen for implementing Requirement depends upon the expected frequency of isolation events. For RAS reasons, they should be seldom. For load balancing, they may be far more frequent. These methods are briefly described here: First pull the corresponding logical address from the partition’s valid space so setting new translations to the logical address are not possible. Then wait for any current in flight translation additions to complete. Followed by either scanning the entire PFT and TCE tables looking for valid translations or checking a use count for the particular logical address range. The PFT/TCE table search may be long, however, it is only done at isolation time. The use count method must be maintained for each add and remove of an address translation with the corresponding accessing of a use count based upon the physical real address of the memory block.

Isolation of PHBs and Slots An isolation of a PHB naturally disconnects the OS image from any of the DR connectors downstream of the PHB (specifically any I/O slots and PCI Hot Plug connectors associated with the PHB). To avoid the complexity of gracefully managing multi-level isolation, isolation is restricted to only “leaf” DR connectors, that is connectors that have no unisolated or usable DR connectors below them. That is, for logical DR connectors below the connector being isolated, a get-sensor-state dr-entity-sense needs to return an unusable (2) and for physical DR connectors below the connector being isolated, the DR entity needs to be isolated first via set-indicator (isolation-state, isolate). The OS is responsible for removing all virtual address mappings to the address range associated with a logical I/O SLOT before making the RTAS set-indicator (isolation-state) call that isolates the SLOT. When the LRDR option is combined with the LPAR option, the hypervisor ensures that the addresses associated with the logical SLOT being isolated are unmapped from both the PFT and TCE tables before successfully completing the isolation of the SLOT connector. If any valid mappings are found, the RTAS set-indicator (isolation-state) does not change the isolation-state and returns with a Status-9001 (Valid outstanding translation). R1--1. For all LRDR options: If a request to set-indicator (isolation-state, isolate) would result in the isolation of one or more other DR connectors which are currently unisolated or usable, then the set-indicator RTAS must fail with a return code of “Multi-level isolation error” (-9000). R1--2. For the LRDR option combined with the LPAR option: The RTAS set-indicator specifying isolate isolation-state of a SLOT DR connector type must check that the IOA address range associated with the slot is unmapped from both the partition’s Page Frame Table(s) and any Translation Control Entries that would reference those locations, else the RTAS routine must return with a Status-9001 (Valid outstanding translation) and the isolation-state is not changed.

Isolation of Coherent Platform Facilities Isolation of a Coherent Platform Facility (COPLATFAC) disconnects the OS image from any of the DR connectors downstream (Coherent Platform Functions or COPLATFUN) of the Coherent Platform Facility. To avoid the complex- ity of gracefully managing multi-level isolation, isolation is restricted to only “leaf” DR connectors, that is connectors that have no unisolated or usable DR connectors below them. All COPLATFUN connectors must return unusable state (2) during get-sensor-state before a COPLATFAC can be isolated, this is done through set-indicator (isolation-state, isolate). The OS is responsible for removing all memory mappings and detaching all processes potentially in use by co- herent platform functions. If valid mappings or processes are found, the set-indicator does not change the isolation state and returns with a Status -9001 (Valid outstanding translation). R1--1. For all LRDR options: If a request to set-indicator (isolation-state, isolate) would result in the isolation of one or more other DR connectors which are currently unisolated or usable, then the set-indicator RTAS must fail with a return code of “Multi-level isolation error” (-9000). R1--2. For the LRDR option combined with the LPAR option: The RTAS set-indicator specifying isolate isolation-state of a COPLATFUN DR connector type must check that all processes associated with the function are detached, else the RTAS routine must return with a Status -9001 (Valid outstanding translation) and the isolation-state is not changed.

<emphasis>set-indicator</emphasis> (dr-indicator) Logical connectors do not have associated dr-indicators (token value 9002). An attempt to set the state of such an indicator results in a “No such indicator implemented” return status. R1--1. For all LRDR options: The calling of set-indicator with a token value of 9002 (dr-indicator) and an index representing a logical connector must fail with a return code of “No such indicator implemented” (-3).

<emphasis>ibm,configure-connector</emphasis> The ibm,configure-connector RTAS call is used to return to the OS the device tree nodes and properties associated with the newly un-isolated logical resources and configure them for use. The ibm,configure-connector RTAS call used against a logical DR connector can encounter other logical DR connectors or physical DR connectors below it in the tree. If a logical connector is encountered below a logical connector that is being configured, the ibm,configure-connector RTAS call will not configure the sub-tree, if it is not owned by the OS (where owned refers to a DR connector that would return a DR entity usable, for a get-sensor dr-entity-sense sensor). If a physical connector is encountered, then the sub-tree below the physical connector may or may not be configured, depending on the implementation. Architecture Note: The requirements of this section specify the minimum sub-tree contents returned for various connector types. Implementations may optionally return other valid previously reported nodes that represent the current configuration of the device tree. Previously reported nodes may not have any changes from their previously reported state. A node that was removed from the configuration due to a DR operation and returns due to a subsequent DR operation is not considered to have been previously reported. It is the caller's responsibility to recognize previously reported nodes. R1--1. For all LRDR options: If a request to ibm,configure-connector specifies a connector that is isolated, ibm,configure-connector must immediately return configuration complete. R1--2. For all LRDR options: If the connector index refers to a connector that would return a “DR entity unusable” status (2), “DR entity available for exchange” status (3), or “DR entity available for recovery” status (4) to the get-sensor dr-entity-sense token, the ibm,configure-connector RTAS call must return “-9003: Cannot configure - Logical DR connector unusable, available for exchange, or available for recovery” on the first call without any configuration action taken on the DR connector. R1--3. For all LRDR options: If a request to ibm,configure-connector specifies a connector of type CPU, the returned sub-tree must consist of the specific cpu-node, its children, and any referenced nodes that had not been previously reported (such as L2 and L3 caches etc.) all containing the properties as would be contained in those nodes had they been available at boot time. Implementation Note: Future platforms that support concurrent maintenance of caches, will require that high level cache nodes (L2, L3 etc.) are added by ibm,configure-connector such that their properties can change as new/repaired hardware is added to the platform. Therefore, it is the OS's responsibility when isolating a CPU to purge any information it may have regarding an orphaned high level cache node. The OS may use the “ibm,phandle” property to selectively remove caches when a processor is removed. The platform considers any high level cache that is newly referenced (reference count for this partition goes from 0 to 1) to have been previously unreported. R1--4. For all LRDR options: If a request to ibm,configure-connector specifies a connector of type MEM, the returned sub-tree must consist of the specific ibm,memory-region node containing the properties as would be contained in that node had it been available at boot time. R1--5. For all LRDR options: If a request to ibm,configure-connector specifies a connector of type PHB or SLOT, then all of the following must be true: The returned values must represent the sub-tree for the specific I/O sub-system represented by the connector, except for entities below any DR connectors (logical or physical) which are below the connector which is the target of the ibm,configure-connector operation (that is, the ibm,configure-connector operation stops at any DR connector). The sub-tree must consist of the specific node and its children all containing the properties as would be contained in those nodes had they been available at boot time, including (if they exist) built-in PCI IOAs. R1--6. For all LRDR options: If a request to ibm,configure-connector specifies a connector of type SLOT, the returned values must represent the sub-tree for the specific I/O sub-system represented by the SLOT connector, and the sub-tree must consist of the specific /pci node and its children all containing the properties as would be contained in those nodes had they been available at boot time, except for the PCI IOA nodes assigned to the OS image that contain the same properties as they would following a PCI hot plug operation (see ). R1--7. For all LRDR options: If a platform implementation powers-up and configures physical DR entities in the sub-tree under a logical DR connector, then a request to ibm,configure-connector of the logical DR connector must use the return status of 990x from the ibm,configure-connector call, as necessary, during the DR entity power-up sequence(s) and must control any power-up and sequencing requirements, as would be done by the platform during platform power-up. R1--8. For all LRDR options: If a request to ibm,configure-connector specifies a connector of type CO- PLATFAC or COPLATFUN, the returned values must represent the sub-tree for the specific coherent plat- form sub-system represented by the COPLATFAC or COPLATFUN connector. All properties are updated with the most recent data from the coherent platform resource.