Low-Level System Information
Machine Interface The machine interface describes the specific use of the Power ISA 64-bit features to implement the ELF ABI version 2.
Processor Architecture This ABI is predicated on, at a minimum, Power ISA version 2.7 and contains additional implementation characteristics. All OpenPOWER instructions that are defined by the Power Architecture can be assumed to be implemented and to work as specified. ABI-conforming implementations must provide these instructions through software emulation if they are not provided by the OpenPOWER-compliant processor. In addition, the instruction specification must meet additional implementation-defined specifics as commonly required by the OpenPOWER specification. OpenPOWER-compliant processors may support additional instructions beyond the published Power Instruction Set Architecture (ISA) and may include optional Power Architcture instructions. This ABI does not explicitly impose any performance constraints on systems.
Data Representation
Byte Ordering The following standard data formats are recognized: 8-bit byte 16-bit halfword 32-bit word 64-bit doubleword 128-bit quadword In little-endian byte ordering, the least-significant byte is located in the lowest addressed byte position in memory (byte 0). This byte ordering is alternately referred to as least-significant byte (LSB) ordering. In big-endian byte ordering, the most-significant byte is located in the lowest addressed byte position in memory (byte 0). This byte ordering is alternately referred to as most-significant byte (MSB) ordering. A specific OpenPOWER-compliant processor implementation must state which type of byte ordering is to be used. MSR[LE|SLE]: Although it may be possible to modify the active byte ordering of an application process that uses application-accessible configuration controls or that uses system calls on some systems, applications that change active byte ordering during the course of execution do not conform to this ABI. through show the conventions assumed in little-endian byte ordering at the bit and byte levels. These conventions are applied to integer and floating-point data types. As shown in , byte numbers are indicated in the upper corners, and bit numbers are indicated in the lower corners. Little-Endian Bit and Byte Numbering Example Little-Endian Byte Number     Little-Endian Bit Number End Little-Endian Bit Number Start
Little-Endian Bit and Byte Numbering in Halfwords 1 0 MSB LSB 15 8 7 0
Little-Endian Bit and Byte Numbering in Words 3 2 1 0 MSB LSB 31 24 23 16 15 8 7 0
Little-Endian Bit and Byte Numbering in Doublewords 7 6 5 4 MSB 63 56 55 48 47 40 39 32 3 2 1 0 LSB 31 24 23 16 15 8 7 0
Little-Endian Bit and Byte Numbering in Quadwords 15 14 13 12 MSB 127 120 119 112 111 104 103 96 11 10 9 8   95 88 87 80 79 72 71 64 7 6 5 4   63 56 55 48 47 40 39 32 3 2 1 0 LSB 31 24 23 16 15 8 7 0
through show the conventions assumed in big-endian byte ordering at the bit and byte levels. These conventions are applied to integer and floating-point data types. As shown in , byte numbers are indicated in the upper corners, and bit numbers are indicated in the lower corners. Big-Endian Bit and Byte Numbering Example Big-Endian Byte Number   Big-Endian Bit Number Start Big-Endian Bit Number End
Big-Endian Bit and Byte Numbering in Halfwords 0 1 MSB LSB 0 7 8 15
Big-Endian Bit and Byte Numbering in Words 0 1 2 3 MSB LSB 0 7 8 15 16 23 24 31
Big-Endian Bit and Byte Numbering in Doublewords 0 1 2 3 MSB 0 7 8 15 16 23 24 31 4 5 6 7 LSB 32 39 40 47 48 55 56 63
Big-Endian Bit and Byte Numbering in Quadwords 0 1 2 3 MSB 0 7 8 15 16 23 24 31 4 5 6 7   32 39 40 47 48 55 56 63 8 9 10 11   64 71 72 79 80 87 88 95 12 13 14 15 LSB 96 103 104 111 112 119 120 127
In the Power ISA, the figures are generally only shown in big-endian byte order. The bits in this data format specification are numbered from left to right (MSB to LSB). FPSCR Formats: As of Power ISA version 2.05, the FPSCR is extended from 32 bits to 64 bits. The fields of the original 32-bit FPSCR are now held in bits 32–63 of the 64-bit FPSCR. The assembly instructions that operate upon the 64-bit FPSCR have either a W instruction field added to select the operative word for the instruction (for example, mtfsfi) or the instruction is extended to operate upon the entire 64-bit FPSCR, (for example, mffs). Fields of the FPSCR that represent 1 or more bits are referred to by field number with an indication of the operative word rather than by bit number.
Fundamental Types describes the ISO C scalar types, and describes the vector types of the POWER SIMD vector programming API. Each type has a required alignment, which is indicated in the Alignment column. Use of these types in data structures must follow the alignment specified, in the order encountered, to ensure consistent mapping. When using variables individually, more strict alignment may be imposed if it has optimization benefits. Scalar Types Type ISO C Types sizeof Alignment Description Boolean _Bool 1 Byte Boolean Character char 1 Byte Unsigned byte unsigned char signed char 1 Byte Signed byte Enumeration signed enum 4 Word Signed word unsigned enum 4 Word Unsigned word Integral int 4 Word Signed word signed int unsigned int 4 Word Unsigned word long int 8 Doubleword Signed doubleword signed long int unsigned long int 8 Doubleword Unsigned doubleword long long int 8 Doubleword Signed doubleword signed long long int unsigned long long int 8 Doubleword Unsigned doubleword short int 2 Halfword Signed halfword signed short int unsigned short int 2 Halfword Unsigned halfword __int128 16 Quadword Signed quadword signed __int128 unsigned __int128 16 Quadword Unsigned quadword Pointer any * 8 Doubleword Data pointer any (*) ( ) Function pointer Binary Floating-Point float 4 Word Single-precision float double 8 Doubleword Double-precision float long double 16 Quadword Extended- or quad-precision float
A NULL pointer has all bits set to zero. A Boolean value is represented as a byte with a value of 0 or 1. If a byte with a value other than 0 or 1 is evaluated as a boolean value (for example, through the use of unions), the behavior is undefined. If an enumerated type contains a negative value, it is compatible with and has the same representation and alignment as int. Otherwise, it is compatible with and has the same representation and alignment as an unsigned int. For each real floating-point type, there is a corresponding imaginary type with the same size and alignment, and there is a corresponding complex type. The complex type has the same alignment as the real type and is twice the size; the representation is the real part followed by the imaginary part. Vector Types Type Power SIMD C Types sizeof Alignment Description vector-128 vector unsigned char 16 Quadword Vector of 16 unsigned bytes. vector signed char 16 Quadword Vector of 16 signed bytes. vector bool char 16 Quadword Vector of 16 bytes with a value of either 0 or 28 – 1. vector unsigned short 16 Quadword Vector of 8 unsigned halfwords. vector signed short 16 Quadword Vector of 8 signed halfwords. vector bool short 16 Quadword Vector of 8 halfwords with a value of either 0 or 216 – 1. vector unsigned int 16 Quadword Vector of 4 unsigned words. vector signed int 16 Quadword Vector of 4 signed words. vector bool int 16 Quadword Vector of 4 words with a value of either 0 or 232 – 1. vector unsigned long The vector long types are deprecated due to their ambiguity between 32-bit and 64-bit environments. The use of the vector long long types is preferred. vector unsigned long long 16 Quadword Vector of 2 unsigned doublewords. vector signed long vector signed long long 16 Quadword Vector of 2 signed doublewords. vector bool long vector bool long long 16 Quadword Vector of 2 doublewords with a value of either 0 or 264 – 1. vector unsigned __int128 16 Quadword Vector of 1 unsigned quadword. vector signed __int128 16 Quadword Vector of 1 signed quadword. vector _Float16 16 Quadword Vector of 8 half-precision floats. vector float 16 Quadword Vector of 4 single-precision floats. vector double 16 Quadword Vector of 2 double-precision floats.
Elements of Boolean vector data types must have a value corresponding to all bits set to either 0 or 1. The result of computations on Boolean vectors, where at least one element is not well formed An element is well formed if it has all bits set to 0 or all bits set to 1. , is undefined for all vector elements.   Decimal Floating-Point (ISO TR 24732 Support) The decimal floating-point data type is used to specify variables corresponding to the IEEE 754-2008 densely packed, decimal floating-point format. Decimal Floating-Point Types Type ISO TR 24732 C Types sizeof Alignment Description Decimal Floating-Point _Decimal32 4 Word Single-precision decimal float. _Decimal64 8 Doubleword Double-precision decimal float. _Decimal128 16 Quadword Quad-precision decimal float.
  IBM EXTENDED PRECISION IBM EXTENDED PRECISION Type Type ISO C Types sizeof Alignment Description IBM EXTENDED PRECISION long double 16 Quadword Two double-precision floats.
  IEEE BINARY 128 QUADRUPLE PRECISION IEEE BINARY 128 QUADRUPLE PRECISION Type Type ISO C Types sizeof Alignment Description Notes IEEE BINARY 128 QUADRUPLE PRECISION long double 16 Quadword IEEE 128-bit quad-precision float. IEEE BINARY 128 QUADRUPLE PRECISION _Float128 16 Quadword IEEE 128-bit quad-precision float. , Phased in. This type is being phased in and it may not be available on all implementations. __float128 shall be recognized as a synonym for the _Float128 data type, and it is used interchangeably to refer to the same type. Implementations that do not offer support for _Float128 may provide this type with the __float128 type only.
  IBM EXTENDED PRECISION && IEEE BINARY 128 QUADRUPLE PRECISION Availability of the long double data type is subject to conformance to a long double standard where the IBM EXTENDED PRECISION format and the IEEE BINARY 128 QUADRUPLE PRECISION format are mutually exclusive.   IEEE BINARY 128 QUADRUPLE PRECISION || IBM EXTENDED PRECISION This ABI provides the following choices for implementation of long double in compilers and systems. The preferred implementation for long double is the IEEE 128-bit quad-precision binary floating-point type. IEEE BINARY 128 QUADRUPLE PRECISION Long double is implemented as an IEEE 128-bit quad-precision binary floating-point type in accordance with the applicable IEEE floating-point standards. Support is provided for all IEEE standard features. IEEE128 quad-precision values are passed and returned in VMX parameter registers. With some compilers, _Float128 can be used to access IEEE128 independent of the floating-point representation chosen for the long double ISO C type. However, this is not part of the C standard. IBM EXTENDED PRECISION Support is provided for the IBM EXTENDED PRECISION format. In this format, double-precision numbers with different magnitudes that do not overlap provide an effective precision of 106 bits or more, depending on the value. The high-order double-precision value (the one that comes first in storage) must have the larger magnitude. The high-order double-precision value must equal the sum of the two values, rounded to nearest double (the Linux convention, unlike AIX). IBM EXTENDED PRECISION form provides the same range as double precision (about 10–308 to 10308) but more precision (a variable amount, about 31 decimal digits or more). As the absolute value of the magnitude decreases (near the denormal range), the precision available in the low-order double also decreases. When the value represented is in the subnormal or denormal range, this representation provides no more precision than 64-bit (double) floating-point. The actual number of bits of precision can vary. If the low-order part is much less than one unit of least precision (ULP) of the high-order part, significant bits (all 0s or all 1s) are implied between the significands of high-order and low-order numbers. Some algorithms that rely on having a fixed number of bits in the significand can fail when using extended precision. This implementation differs from the IEEE 754 Standard in the following ways: The software support is restricted to round-to-nearest mode. Programs that use extended precision must ensure that this rounding mode is in effect when extended-precision calculations are performed. This implementation does not fully support the IEEE special numbers NaN and INF. These values are encoded in the high-order double value only. The low-order value is not significant, but the low-order value of an infinity must be positive or negative zero. This implementation does not support the IEEE status flags for overflow, underflow, and other conditions. These flags have no meaning in this format.
Aggregates and Unions The following rules for aggregates (structures and arrays) and unions apply to their alignment and size: The entire aggregate or union must be aligned to its most strictly aligned member, which corresponds to the member with the largest alignment, including flexible array members. Each member is assigned the lowest available offset that meets the alignment requirements of the member. Depending on the previous member, internal padding can be required. Unless it is packed, the entire aggregate or union must have a size that is a multiple of its alignment. Depending on the last member, tail padding may be required. For through , the big-endian byte offsets are located in the upper left corners, and the little-endian byte offsets are located in the upper right corners.
Structure Smaller than a Word
Structure with No Padding
Structure with Internal Padding
Structure with Internal and Tail Padding
Structure with Vector Element and Internal Padding
Structure with Vector Element and Tail Padding
Structure with Internal Padding and Vector Element
Structure with Internal Padding and 128-Bit Integer
Packed Structure
Union Allocation
Bit Fields Bit fields can be present in definitions of C structures and unions. These bit fields define whole objects within the structure or union where the number of bits in the bit field is specified. In , a signed range goes from –2w – 1 to 2w – 1 – 1 and an unsigned range goes from 0 to 2w – 1. Bit Field Types Bit Field Type Width (w) _Bool 1 signed char 1–8 unsigned char signed short 1–16 unsigned short signed int 1–32 unsigned int enum signed long 1–64 unsigned long signed long long unsigned long long signed __int128 1–128 unsigned __int128
Bit fields can be a signed or unsigned of type short, int, long, or long long. However, bit fields shall have the same range for each corresponding type. For example, signed short must have the same range as unsigned short. All members of structures and unions, including bit fields, must comply with the size and alignment rules. The following list of additional size and alignment rules apply to bit fields: The allocation of bit fields is determined by the system endianness. For little-endian implementations, the bit allocation is from the least-significant (right) end to the most-significant (left) end. The reverse is true for big-endian implementations; the bit allocation is from most-significant (left) end to the least-significant (right) end. Unless it appears in a packed struct, a bit field cannot cross its unit boundary; it must occupy part or all of the storage unit allocated for its declared type. If there is enough space within a storage unit, bit fields must share the storage unit with other structure members, including members that are not bit fields. Clearly, all the structure members occupy different parts of the storage unit. The types of unnamed bit fields have no effect on the alignment of a structure or union. However, the offsets of an individual bit field's member must comply with the alignment rules. An unnamed bit field of zero width causes sufficient padding (possibly none) to be inserted for the next member, or the end of the structure if there are no more nonzero width members, to have an offset from the start of the structure that is a multiple of the size of the declared type of the zero-width member. In , the little-endian byte offsets are given in the upper right corners, and the bit numbers are given in the lower corners. Little-Endian Bit Numbering for 0x01020304 7 6 5 4 0 1 0 2 63 56 55 48 47 40 39 32 3 2 1 0 0 3 0 4 31 24 23 16 15 8 7 0
In , the big-endian byte offsets are given in the upper left corners, and the bit numbers are given in the lower corners. Big-Endian Bit Numbering for 0x01020304 0 1 2 3 0 1 0 2 0 7 8 15 16 23 24 31 4 5 6 7 0 3 0 4 32 39 40 47 48 55 56 63
The byte offsets for structure and union members are shown in through .
Simple Bit Field Allocation
Bit Field Allocation with Boundary Alignment
Bit Field Allocation with Storage Unit Sharing
Bit Field Allocation in a Union
Bit Field Allocation with Unnamed Bit Fields
In , the alignment of the structure is not affected by the unnamed short and int fields. The named members are aligned relative to the start of the structure. However, it is possible that the alignment of the named members is not on optimum boundaries in memory. For instance, in an array of the structure in , the d members will not all be on 4-byte (integer) boundaries.
Function Calling Sequence The standard sequence for function calls is outlined in this section. The layout of the stack frame, the parameter passing convention, and the register usage are also described in this section. Standard library functions use these conventions, except as documented for the register save and restore functions. The conventions given in this section are adhered to by C programs. For more information about the implementation of C, See https://apps.na.collabserv.com/meetings/join?id=2897-3986 . While it is recommended that all functions use the standard calling sequence, the requirements of the standard calling sequence are only applicable to global functions. Different calling sequences and conventions can be used by local functions that cannot be reached from other compilation units, if they comply with the stack back trace requirements. Some tools may not work with alternate calling sequences and conventions.
Function Call Linkage Protocols The compiler (or assembly programmer) and linker cooperate to make function calls as efficient as possible. Different protocols are required depending on whether a call is local, whether the caller and/or callee use a TOC pointer in r2 for code or data accesses (see ), and whether the caller and/or callee guarantee to preserve r2. A local function call is one where the callee is known and visible within the unit of code being compiled or assembled. A function that uses a TOC pointer always has a separate local entry point (see ), and preserves r2 when called via its local entry point. See for information about encoding this information in the symbol table entries of functions. summarizes the protocol requirements for external function calls, and summarizes the protocol requirements for local function calls. Each entry in these tables is further described in the referenced section. Note that this ABI does not define protocols where the caller does not use a TOC pointer, but does preserve r2. It is most efficient when such functions are always leaf procedures. It is not forbidden for such a function to call another function, but in this case it is up to the caller to save and restore r2 around each call. Protocols for External Function Calls Caller Callee PLT stub nop needed? Relocation Section link Uses TOC Any r2 save Yes R_PPC64_REL24 Does not use TOC, does not preserve r2 Any No r2 save No R_PPC64_REL24_NOTOC
Protocols for Local Function Calls Caller Callee Call method nop needed? Relocation Section link Uses TOC Uses TOC Local No R_PPC64_REL24 Does not use TOC, preserves r2 Local No R_PPC64_REL24 Does not use TOC, does not preserve r2 r2 save stub Yes R_PPC64_REL24 Does not use TOC, does not preserve r2 Uses TOC r12 setup stub No R_PPC64_REL24_NOTOC Does not use TOC Local No R_PPC64_REL24_NOTOC
External Call, Caller Uses TOC When a function that uses a TOC pointer makes any call to an external function, the compiler generates a nop instruction after the bl instruction for the call. The linker generates a procedure linkage table (PLT) stub that saves r2 and replaces the nop instruction with a restore of r2. (The save of r2 may be omitted from the PLT stub if the R_PPC64_TOCSAVE relocation is used; see .) If the callee requires a TOC, the PLT stub also includes code to place the callee's global entry point into r12. See for a full description of PLT stubs.
External Call, Caller Does Not Use TOC, Caller Does Not Preserve r2 When a function that does not use a TOC pointer and does not preserve r2 makes any call to an external function, the compiler does not generate a nop instruction after the bl instruction for the call. Instead, the compiler annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation. The linker generates a PLT stub that does not include a save of r2. If the callee requires a TOC, the PLT stub also includes code to place the callee's global entry point into r12.
Local Call, Caller Uses TOC, Callee Preserves r2 When a function that uses a TOC pointer makes a local call to a function that also preserves r2, the compiler generates a direct call to the function's local entry point, and does not generate a nop instruction after the call.
Local Call, Caller Uses TOC, Callee Does Not Preserve r2 When a function that uses a TOC pointer makes a local call to a function that does not preserve r2, the compiler generates a nop instruction after the call. The linker generates a PLT stub that saves r2, but does not include code to place the callee's global entry point into r12, and replaces the nop instruction with a restore of r2. (The save of r2 may be omitted from the PLT stub if the R_PPC64_TOCSAVE relocation is used; see .)
Local Call, Caller Does Not Preserve r2, Callee Uses TOC When a function that does not use a TOC and does not preserve r2 makes a local call to a function that requires a TOC pointer, the compiler does not generate a nop instruction after the bl instruction for the call. The linker generates a PLT stub that does not include a save of r2, but does include code to place the callee's global entry point into r12. The compiler annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation.
Local Call, Caller Does Not Preserve r2, Callee Does Not Use TOC When a function that does not use a TOC and does not preserve r2 makes a local call to a function that does not require a TOC pointer, the compiler generates a direct call to the function's local entry point, and does not generate a nop instruction after the call. The compiler annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation.
Registers Programs and compilers may freely use all registers except those reserved for system use. The system signal handlers are responsible for preserving the original values upon return to the original execution path. Signals that can interrupt the original execution path are documented in the System V Interface Definition (SVID). The tables in give an overview of the registers that are global during program execution. The tables use four terms to describe register preservation rules: Nonvolatile A caller can expect that the contents of all registers marked nonvolatile are valid after control returns from a function call. A callee shall save the contents of all registers marked nonvolatile before modification. The callee must restore the contents of all such registers before returning to its caller. Volatile A caller cannot trust that the contents of registers marked volatile have been preserved across a function call. A callee need not save the contents of registers marked volatile before modification. Limited-access The contents of registers marked limited-access have special preservation rules. These registers have mutability restricted to certain bit fields as defined by the Power ISA. The individual bits of these bit fields are defined by this ABI to be limited-access. Under normal conditions, a caller can expect that these bits have been preserved across a function call. Under the special conditions indicated in , a caller shall expect that these bits will have changed across function calls even if they have not. A callee may only permanently modify these bits without preserving the state upon entrance to the function if the callee satisfies the special conditions indicated in . Otherwise, these bits must be preserved before modification and restored before returning to the caller. Reserved The contents of registers marked reserved are for exclusive use of system functions, including the ABI. In limited circumstances, a program or program libraries may set or query such registers, but only when explicitly allowed in this document.
Register Roles In the 64-bit OpenPOWER Architecture, there are always 32 general-purpose registers, each 64 bits wide. Throughout this document the symbol rN is used, where N is a register number, to refer to general-purpose register N. Register Roles Register Preservation Rules Purpose r0 Volatile Optional use in function linkage. Used in function prologues. r1 Nonvolatile Stack frame pointer. r2 Nonvolatile Register r2 is nonvolatile with respect to calls between most functions in the same compilation unit. It is saved and restored by code inserted by the linker resolving a call to an external function. For more information, see and . or Volatile Register r2 is volatile and available for use in a function that does not use a TOC pointer and that does not preserve r2. See . TOC pointer. r3–r10 Volatile Parameter and return values. r11 Volatile Optional use in function linkage. Used as an environment pointer in languages that require environment pointers. r12 Volatile Optional use in function linkage. Function entry address at the global entry point. r13 Reserved Thread pointer (see ). r14–r31 If a function needs a frame pointer, assigning r31 to the role of the frame pointer is recommended. Nonvolatile Local variables. LR Volatile Link register. CTR Volatile Loop count register. TAR Reserved Reserved for system use. This register should not be read or written by application software. XER Volatile Fixed-point exception register. CR0–CR1 Volatile Condition register fields. CR2–CR4 Nonvolatile Condition register fields. CR5–CR7 Volatile Condition register fields. DSCR Limited Access Data stream prefetch control. VRSAVE Reserved Reserved for system use. This register should not be read or written by application software.
  TOC Pointer Usage As described in , the TOC pointer, r2, is commonly initialized by the global function entry point when a function is called through the global entry point. It may be called from a module other than the current function's module or from an unknown call point, such as through a function pointer. (For more information, see .) In those instances, it is the caller's responsibility to store the TOC pointer, r2, in the TOC pointer doubleword of the caller's stack frame. For references external to the compilation unit, this code is inserted by the static linker if a function is to be resolved by the dynamic linker. For references through function pointers, it is the compiler's or assembler programmer's responsibility to insert appropriate TOC save and restore code. If the function is called from the same module as the callee, the callee must normally preserve the value of r2. If the callee function is called from a function in the same compilation unit as the callee, and the callee does not preserve r2, the caller is responsible for saving and restoring the TOC pointer if it needs it. (See for more information.) When a function calls another function that requires a TOC pointer, the TOC pointer must have a legal value pointing to the TOC base, which may be initialized as described in . When global data is accessed, the TOC pointer must be available for dereference at the point of all uses of values derived from the TOC pointer in conjunction with the @l operator. This property is used by the linker to optimize TOC pointer accesses. In addition, all reaching definitions for a TOC-pointer-derived access must compute the same definition for code to be ABI compliant. (See the .) In some implementations, non ABI-compliant code may be processed by providing additional linker options; for example, linker options disabling linker optimization. However, this behavior in support of non-ABI compliant code is not guaranteed to be portable and supported in all systems. For examples of compliant and noncompliant code, see .   Optional Function Linkage Except as follows, a function cannot depend on the values of those registers that are optional in the function linkage (r0, r11, and r12) because they may be altered by interlibrary calls: When a function is entered in a way to initialize its environment pointer, register r11 contains the environment pointer. It is used to support languages with access to additional environment context; for example, for languages that support lexical nesting to access its lexically nested outer context. When a function that requires a TOC pointer is entered through its global entry point, register r12 contains the entry-point address. For more information, see the description of dual entry points in and .   Stack Frame Pointer The stack pointer always points to the lowest allocated valid stack frame. It must maintain quadword alignment and grow toward the lower addresses. The contents of the word at that address point to the previously allocated stack frame when the code has been compiled to maintain back chains. A called function is permitted to decrement it if required. For more information, see .   Link Register The link register contains the address that a called function normally returns to. It is volatile across function calls.   Condition Register Fields In the condition register, the bit fields CR2, CR3, and CR4 are nonvolatile. The value on entry must be restored on exit. The other bit fields are volatile. This ABI requires OpenPOWER-compliant processors to implement mfocr instructions in a manner that initializes undefined bits of the RT result register of mfocr instructions to one of the following values: 0, in accordance with OpenPOWER-compliant processor implementation practice The architected value of the corresponding CR field in the mfocr instruction When executing an mfocr instruction, the POWER8 processor does not implement the behavior described in the "Fixed-Point Invalid Forms and Undefined Conditions" section of POWER8 Processor User's Manual for the Single-Chip Module. Instead, it replicates the selected condition register field within the byte that contains it rather than initializing to 0 the bits corresponding to the nonselected bits of the byte that contains it. When generating code to save two condition register fields that are stored in the same byte, the compiler must mask the value received from mfocr to avoid corruption of the resulting (partial) condition register word. This erratum does not apply to POWER9 and subsequent processors. For more information, see Power ISA, version 3.0B and "Fixed-Point Invalid Forms and Undefined Conditions" in POWER9 Processor User's Manual. Floating-Point Registers In OpenPOWER-compliant processors, floating-point and vector functions are implemented using a unified vector-scalar model. As shown in and , there are 64 vector-scalar registers; each is 128 bits wide. The vector-scalar registers can be addressed with vector-scalar instructions, for vector and scalar processing of all 64 registers, or with the "classic" Power floating-point instructions to refer to a 32-register subset of 64 bits per register. They can also be addressed with VMX instructions to refer to a 32-register subset of 128-bit wide registers.
Floating-Point Registers as Part of VSRs
Vector Registers as Part of VSRs
The classic floating-point repertoire consists of 32 floating-point registers, each 64 bits wide, and an associated special-purpose register to provide floating-point status and control. Throughout this document, the symbol fN is used, where N is a register number, to refer to floating-point register N. For the purpose of function calls, the right half of VSX registers, corresponding to the classic floating-point registers (that is, vsr0–vsr31), is volatile. Floating-Point Register Roles for Binary Floating-Point Types Register Preservation Rules Purpose f0 Volatile Local variables. f1–f13 Volatile Used for parameter passing and return values of binary float types. f14–f31 Nonvolatile Local variables. FPSCR Limited-access Floating-Point Status and Control Register limited-access bits. Preservation rules governing the limited-access bits for the bit fields [VE], [OE], [UE], [ZE], [XE], and [RN] are presented in .
  DFP Support The OpenPOWER ABI supports the decimal floating-point (DFP) format and DFP language extensions. The default implementation of DFP types shall be an implementation of the IEEE DFP standard (IEEE Standard 754-2008). The default may be either a hardware or a software implementation. The Power ISA decimal floating-point category extends the Power Architecture by adding a decimal floating-point unit. It uses the existing 64-bit floating-point registers and extends the FPSCR register to 64 bits, where it defines a decimal rounding-control field in the extended space. Single-precision, double-precision, and quad-precision decimal floating-point parameters shall be passed in the floating-point registers. Single-precision decimal floating-point shall occupy the lower half of a floating-point register. Quad-precision floating-point values shall occupy an even/odd register pair. When passing quad-precision decimal floating-point parameters in accordance with this ABI, an odd floating-point register may be skipped in allocation order to align quad-precision parameters and results in an even/odd register pair. When a floating-point register is skipped during input parameter allocation, words in the corresponding GPR or memory doubleword in the parameter list are not skipped. Floating-Point Register Roles for Decimal Floating-Point Types Register Preservation Rules Purpose FPSCR Limited-access Floating-Point Status and Control Register limited-access bits. Preservation rules governing the limited-access bits for the bit field [DRN] are presented in .
Vector Registers The OpenPOWER vector-category instruction repertoire provides the ability to reference 32 vector registers, each 128 bits wide, of the vector-scalar register file, and a special-purpose register VSCR. Throughout this document, the symbol vN is used, where N is a register number, to refer to vector register N. Vector Register Roles Register Preservation Rules Purpose v0–v1 Volatile Local variables. v2–v13 Volatile Used for parameter passing and return values. v14–v19 Volatile Local variables. v20–v31 Nonvolatile Local variables. VSCR Limited-access 32-bit Vector Status and Control Register. Preservation rules governing the limited-access bits for the bit field [NJ] are presented in .
  IEEE BINARY 128 QUADRUPLE PRECISION Parameters and function results in IEEE BINARY 128 QUADRUPLE PRECISION format shall be passed in a single 128-bit vector register as if they were vector values.   IBM EXTENDED PRECISION Parameters and function results in the IBM EXTENDED PRECISION format with a pair of two double-precision floating-point values shall be passed in two successive floating-point registers. If only one value can be passed in a floating-point register, the second parameter will be passed in a GPR or in memory in accordance with the parameter passing rules for structure aggregates.
Limited-Access Bits The Power ISA identifies a number of registers that have mutability limited to the specific bit fields indicated in the following list: FPSCR [VE] The Floating-Point Invalid Operation Exception Enable bit [VE] of the FPSCR register. FPSCR [OE] The Floating-Point Overflow Exception Enable bit [OE] of the FPSCR register. FPSCR [UE] The Floating-Point Underflow Exception Enable bit [UE] of the FPSCR register. FPSCR [ZE] The Floating-Point Zero Divide Exception Enable bit [ZE] of the FPSCR register. FPSCR [XE] The Floating-Point Inexact Exception Enable bit [XE] of the FPSCR register. FPSCR [RN] The Binary Floating-Point Rounding Control field [RN] of the FPSCR register. FPSCR [DRN] The DFP Rounding Control field [DRN] of the 64-bit FPSCR register. VSCR [NJ] The Vector Non-Java Mode field [NJ] of the VSCR register. The bits composing these bit fields are identified as limited access because this ABI manages how they are to be modified and preserved across function calls. Limited-access bits may be changed across function calls only if the called function has specific permission to do so as indicated by the following conditions. A function without permission to change the limited-access bits across a function call shall save the value of the register before modifying the bits and restore it before returning to its calling function.   Limited-Access Conditions Standard library functions expressly defined to change the state of limited-access bits are not constrained by nonvolatile preservation rules; for example, the fesetround( ) and feenableexcept( ) functions. All other standard library functions shall save the old value of these bits on entry, change the bits for their purpose, and restore the bits before returning. Where a standard library function, such as qsort( ), calls functions provided by an application, the following rules shall be observed: The limited-access bits, on entry to the first call to such a callback, must have the values they had on entry to the library function. The limited-access bits, on entry to a subsequent call to such a callback, must have the values they had on exit from the previous call to such a callback. The limited-access bits, on exit from the library function, must have the values they had on exit from the last call to such a callback. The compiler can directly generate code that saves and restores the limited-access bits. The values of the limited-access bits are unspecified on entry into a signal handler because a library or user function can temporarily modify the limited-access bits when the signal is taken. When setjmp( ) returns from its first call (also known as direct invocation), it does not change the limited access bits. The limited access bits have the values they had on entry to the setjmp( ) function. When longjmp( ) is performed, it appears to be returning from a call to setjmp( ). In this instance, the limited access bits are not restored to the values they had on entry to the setjmp( ) function. C library functions, such as _FPU_SETCW( ) defined in <fpu_control.h>, may modify the limited-access bits of the FPSCR. Additional C99 functions that can modify the FPSCR are defined in <fenv.h>. The vector vec_mtvscr( ) function may change the limited-access NJ bit. The unwinder does not modify limited-access bits. To avoid the overhead of saving and restoring the FPSCR on every call, it is only necessary to save it briefly before the call and to restore it after any instructions or groups of instructions that need to change its control flags have been completed. In some cases, that can be avoided by using instructions that override the FPSCR rounding mode. If an exception and the resulting signal occur while the FPSCR is temporarily modified, the signal handler cannot rely on the default control flag settings and must behave as follows: If the signal handler will unwind the stack, print a traceback, and abort the program, no other special handling is needed. If the signal handler will adjust some register values (for example, replace a NaN with a zero or infinity) and then resume execution, no other special handling is needed. There is one exception; if the signal handler changed the control flags, it should restore them. If the signal handler will unwind the stack part way and resume execution in a user exception handler, the application should save the FPSCR beforehand and the exception handler should restore its control flags.
The Stack Frame A function shall establish a stack frame if it requires the use of nonvolatile registers, its local variable usage cannot be optimized into registers and the protected zone, or it calls another function. For more information about the protected zone, see . It need only allocate space for the required minimal stack frame, consisting of a back-chain doubleword (optionally containing a back-chain pointer), the saved CR word, a reserved word, the saved LR doubleword, and the saved TOC pointer doubleword. shows the relative layout of an allocated stack frame following a nonleaf function call, where the stack pointer points to the back-chain word of the caller's stack frame. By default, the stack pointer always points to the back-chain word of the most recently allocated stack frame. For more information, see .
Stack Frame Organization
In the white areas indicate an optional save area of the stack frame. For a description of the optional save areas described by this ABI, see .
General Stack Frame Requirements The following general requirements apply to all stack frames: The stack shall be quadword aligned. The minimum stack frame size shall be 32 bytes. A minimum stack frame consists of the first 4 doublewords (back-chain doubleword, CR save word and reserved word, LR save doubleword, and TOC pointer doubleword), with padding to meet the 16-byte alignment requirement. There is no maximum stack frame size defined. Padding shall be added to the Local Variable Space of the stack frame to maintain the defined stack frame alignment. The stack pointer, r1, shall always point to the lowest address doubleword of the most recently allocated stack frame. The stack shall start at high addresses and grow downward toward lower addresses. The lowest address doubleword (the back-chain word in ) shall point to the previously allocated stack frame when a back chain is present. As an exception, the first stack frame shall have a value of 0 (NULL). If required, the stack pointer shall be decremented in the called function's prologue and restored in the called function's epilogue. See . Before a function calls any other functions, it shall save the value of the LR register into the LR save doubleword of the caller's stack frame. An optional frame pointer may be created if necessary (for example, as a result of dynamic allocation on the stack as described in to address arguments or local variables. An example of a minimum stack frame allocation that meets these requirements is shown in .
Minimum Stack Frame Allocation with and without Back Chain
Minimum Stack Frame Elements Back Chain Doubleword When a back chain is not present, alternate information compatible with the ABI unwind framework to unwind a stack must be provided by the compiler, for all languages, regardless of language features. A compiler that does not provide such system-compatible unwind information must generate a back chain. All compilers shall generate back chain information by default, and default libraries shall contain a back chain. On systems where system-wide unwind capabilities are not provided, compilers must not generate object files without back-chain generation. A system shall provided a programmatic interface to query unwind information when system-wide unwind capabilities are provided.   CR Save Word If a function changes the value in any nonvolatile field of the condition register, it shall first save at least the value of those nonvolatile fields of the condition register, to restore before function exit. The caller frame CR Save Word may be used as the save location. This location in the current frame may be used as temporary storage, which is volatile over function calls.   Reserved Word This word is reserved for system functions. Modifications of the value contained in this word are prohibited unless explicitly allowed by future ABI amendments.   LR Save Doubleword If a function changes the value of the link register, it must first save the old value to restore before function exit. The caller frame LR Save Doubleword may be used as the save location. This location in the current frame may be used as temporary storage, which is volatile over a function call.   TOC Pointer Doubleword If a function changes the value of the TOC pointer register, it shall first save it in the TOC pointer doubleword.
Optional Save Areas This ABI provides a stack frame with a number of optional save areas. These areas are always present, but may be of size 0. This section indicates the relative position of these save areas in relation to each other and the primary elements of the stack frame. Because the back-chain word of a stack frame must maintain quadword alignment, a reserved word is introduced above the CR save word to provide a quadword-aligned minimal stack frame and align the doublewords within the fixed stack frame portion at doubleword boundaries. An optional alignment padding to a quadword-boundary element might be necessary above the Vector Register Save Area to provide 16-byte alignment, as shown in .   Floating-Point Register Save Area If a function changes the value in any nonvolatile floating-point register fN, it shall first save the value in fN in the Floating-Point Register Save Area and restore the register upon function exit. If full unwind information such as DWARF is present, registers can be saved in arbitrary locations in the stack frame. If the system floating-point register save and restore functions are to be used, the floating-point registers shall be saved in a contiguous range. Floating-point register fN is saved in the doubleword located 8 × (32 – N) bytes before the back-chain word of the previous frame, as shown in The Floating-Point Register Save Area is always doubleword aligned. The size of the Floating-Point Register Save Area depends upon the number of floating-point registers that must be saved. If no floating-point registers are to be saved, the Floating-Point Register Save Area has a zero size.   General-Purpose Register Save Area If a function changes the value in any nonvolatile general-purpose register rN, it shall first save the value in rN in the General-Purpose Register Save Area and restore the register upon function exit. If full unwind information such as DWARF is present, registers can be saved in arbitrary locations in the stack frame. If the system general-purpose register save and restore functions are to be used, the general-purpose registers shall be saved in a contiguous range. General-purpose register rN is saved in the doubleword located 8 x (32 – N) bytes before the back-chain word of the previous frame, as shown in . The General-Purpose Register Save Area is always doubleword aligned. The size of the General-Purpose Register Save Area depends upon the number of general registers that must be saved. If no general-purpose registers are to be saved, the General-Purpose Register Save Area has a zero size.   Vector Register Save Area If a function changes the value in any nonvolatile vector register vN, it shall first save the value in vN in the Vector Register Save Area and restore the register upon function exit. If full unwind information such as DWARF is present, registers can be saved in arbitrary locations in the stack frame. If the system vector register save and restore functions are to be used, the vector registers shall be saved in a contiguous range. Vector register vN is saved in the doubleword located 16 x (32 – N) bytes before the General-Purpose Register Save Areas plus alignment padding, as shown in . The Vector Register Save Area is always quadword aligned. If necessary to ensure suitable alignment of the vector save area, a padding doubleword may be introduced between the vector register and General-Purpose Register Save Areas, and/or the Local Variable Space may be expanded to the next quadword boundary. The size of the Vector Register Save Area depends upon the number of vector registers that must be saved. It ranges from 0 bytes to a maximum of 192 bytes (12 X 16). If no vector registers are to be saved, the Vector Register Save Area has a zero size.   Local Variable Space The Local Variable Space is used for allocation of local variables. The Local Variable Space is located immediately above the Parameter Save Area, at a higher address. There is no restriction on the size of this area. Sometimes a register spill area is needed. It is typically positioned above the Local Variable Space. The Local Variable Space also contains any parameters that need to be assigned a memory address when the function's parameter list does not require a save area to be allocated by the caller.   Parameter Save Area The Parameter Save Area shall be allocated by the caller for function calls unless a prototype is provided for the callee indicating that all parameters can be passed in registers. (This requires a Parameter Save Area to be created for functions where the number and type of parameters exceeds the registers available for parameter passing in registers, for those functions where the prototype contains an ellipsis to indicate a variadic function, and functions declared without a prototype.) When the caller allocates the Parameter Save Area, it will always be automatically quadword aligned because it must always start at SP + 32. It shall be at least 8 doublewords in length. If a function needs to pass more than 8 doublewords of arguments, the Parameter Save Area shall be large enough to spill all register-based parameters and to contain the arguments that the caller stores in it. The calling function cannot expect that the contents of this save area are valid when returning from the callee. The Parameter Save Area, which is located at a fixed offset of 32 bytes from the stack pointer, is reserved in each stack frame for use as an argument list when an in-memory argument list is required. For example, a Parameter Save Area must be allocated by the caller when calling functions with the following characteristics: Prototyped functions where the parameters cannot be contained in the parameter registers Prototyped functions with variadic arguments Functions without a suitable declaration available to the caller to determine the called function's characteristics (for example, functions in C without a prototype in scope, in accordance with Brian Kernighan and Dennis Ritchie, The C Programming Language, 1st edition). Under these circumstances, a minimum of 8 doublewords are always reserved. The size of this area must be sufficient to hold the longest argument list being passed by the function that owns the stack frame. Although not all arguments for a particular call are located in storage, when an in-memory parameter list is required, consider the parameters to be forming a list in this area. Each argument occupies one or more doublewords. More arguments might be passed than can be stored in the parameter registers. In that case, the remaining arguments are stored in the Parameter Save Area. The values passed on the stack are identical to the values placed in registers. Therefore, the stack contains register images for the values that are not placed into registers. This ABI uses a simple va_list type for variable lists to point to the memory location of the next parameter. Therefore, regardless of type, variable arguments must always be in the same location so that they can be found at runtime. The first 8 doublewords are located in general registers r3–r10. Any additional doublewords are located in the stack Parameter Save Area. Alignment requirements such as those for vector types may require the va_list pointer to first be aligned before accessing a value. Follow these rules for parameter passing: Map each argument to enough doublewords in the Parameter Save Area to hold its value. Map single-precision floating-point values to the least-significant word in a single doubleword. Map double-precision floating-point values to a single doubleword. Map simple integer types (char, short, int, long, enum) to a single doubleword. Sign or zero extend values shorter than a doubleword to a doubleword based on whether the source data type is signed or unsigned. When 128-bit integer types are passed by value, map each to two consecutive GPRs, two consecutive doublewords, or a GPR and a doubleword. In big-endian environments, the most-significant doubleword of the quadword (__int128) parameter is stored in the lower numbered GPR or parameter word. The least-significant doubleword of the quadword (__int128) is stored in the higher numbered GPR or parameter word. In little-endian environments, the least-significant doubleword of the quadword (__int128) parameter is stored in the lower numbered GPR or parameter word. The most-significant doubleword of the quadword (__int128) is stored in the higher numbered GPR or parameter word. The required alignment of int128 data types is 16 bytes. Therefore, by-value parameters must be copied to a new location in the local variable area of the callee's stack frame before the address of the type can be provided (for example, using the address-of operator, or when the variable is to be passed by reference), when the incoming parameter is not aligned at a 16-byte boundary. If extended precision floating-point values in IEEE BINARY 128 QUADRUPLE PRECISION format are supported (see ), map them to a single quadword, quadword aligned. This might result in skipped doublewords in the Parameter Save Area. If extended precision floating-point values in IBM EXTENDED PRECISION format are supported (see ), map them to two consecutive doublewords. The required alignment of IBM EXTENDED PRECISION data types is 16 bytes. Therefore, by-value parameters must be copied to a new location in the local variable area of the callee's stack frame before the address of the type can be provided (for example, using the address-of operator, or when the variable is to be passed by reference), when the incoming parameter is not aligned at a 16-byte boundary. Map complex floating-point and complex integer types as if the argument was specified as separate real and imaginary parts. Map pointers to a single doubleword. Map vectors to a single quadword, quadword aligned. This might result in skipped doublewords in the Parameter Save Area. Map fixed-size aggregates and unions passed by value to as many doublewords of the Parameter Save Area as the value uses in memory. Align aggregates and unions as follows: Aggregates that contain qualified floating-point or vector arguments are normally aligned at the alignment of their base type. For more information about qualified arguments, see . Other aggregates are normally aligned in accordance with the aggregate's defined alignment. The alignment will never be larger than the stack frame alignment (16 bytes). This might result in doublewords being skipped for alignment. When a doubleword in the Parameter Save Area (or its GPR copy) contains at least a portion of a structure, that doubleword must contain all other portions mapping to the same doubleword. (That is, a doubleword can either be completely valid, or completely invalid, but not partially valid and invalid, except in the last doubleword where invalid padding may be present.) Pad an aggregate or union smaller than one doubleword in size, but having a non-zero size, so that it is in the least-significant bits of the doubleword. Pad all others, if necessary, at their tail. Variable size aggregates or unions are passed by reference. Map other scalar values to the number of doublewords required by their size. Future data types that have an architecturally defined quadword-required alignment will be aligned at a quadword boundary. If the callee has a known prototype, arguments are converted to the type of the corresponding parameter when loaded to their parameter registers or when being mapped into the Parameter Save Area. For example, if a long is used as an argument to a float double parameter, the value is converted to double-precision and mapped to a doubleword in the Parameter Save Area.
Protected Zone The 288 bytes below the stack pointer are available as volatile program storage that is not preserved across function calls. Interrupt handlers and any other functions that might run without an explicit call must take care to preserve a protected zone, also referred to as the red zone, of 512 bytes that consists of: The 288-byte volatile program storage region that is used to hold saved registers and local variables An additional 224 bytes below the volatile program storage region that is set aside as a volatile system storage region for system functions If a function does not call other functions and does not need more stack space than is available in the volatile program storage region (that is, 288 bytes), it does not need to have a stack frame. The 224-byte volatile system storage region is not available to compilers for allocation to saved registers and local variables.
Parameter Passing in Registers For the OpenPOWER Architecture, it is more efficient to pass arguments to functions in registers rather than through memory. For more information about passing parameters through memory, see . For the OpenPOWER ABI, the following parameters can be passed in registers: Up to eight arguments can be passed in general-purpose registers r3–r10. Up to thirteen qualified floating-point arguments can be passed in floating-point registers f1–f13 or up to twelve in vector registers v2–v13. Up to thirteen single-precision or double-precision decimal floating-point arguments can be passed in floating-point registers f1–f13. Up to six quad-precision decimal floating-point arguments can be passed in even-odd floating-point register pairs f2–f13. Up to 12 qualified vector arguments can be passed in v2–v13. A qualified floating-point argument corresponds to: A scalar floating-point data type Each member of a complex floating-point type A member of a homogeneous aggregate of multiple like data types passed in up to eight floating-point registers A homogeneous aggregate can consist of a variety of nested constructs including structures, unions, and array members, which shall be traversed to determine the types and number of members of the base floating-point type. (A complex floating-point data type is treated as if two separate scalar values of the base type were passed.) Homogeneous floating-point aggregates can have up to four IBM EXTENDED PRECISION members, four IEEE BINARY 128 QUADRUPLE PRECISION members, four _Decimal128 members, or eight members of other floating-point types. (Unions are treated as their largest member. For homogeneous unions, different union alternatives may have different sizes, provided that all union members are homogeneous with respect to each other.) They are passed in floating-point registers if parameters of that type would be passed in floating-point registers. They are passed in vector registers if parameters of that type would be passed in vector registers. They are passed as if each member was specified as a separate parameter. A qualified vector argument corresponds to: A vector data type A member of a homogeneous aggregate of multiple like data types passed in up to eight vector registers Any future type requiring 16-byte alignment (see ) or processed in vector registers For the purpose of determining a qualified floating-point argument, _Float128 shall be considered a vector data type. In addition, _Float128 is like a vector data type for determining if multiple aggregate members are like. A homogeneous aggregate can consist of a variety of nested constructs including structures, unions, and array members, which shall be traversed to determine the types and number of members of the base vector type. Homogeneous vector aggregates with up to eight members are passed in up to eight vector registers as if each member was specified as a separate parameter. (Unions are treated as their largest member. For homogeneous unions, different union alternatives may have different sizes, provided that all union members are homogeneous with respect to each other.) Floating-point and vector aggregates that contain padding words and integer fields with a width of 0 should not be treated as homogeneous aggregates. A homogeneous aggregate is either a homogeneous floating-point aggregate or a homogeneous vector aggregate. This ABI does not specify homogeneous aggregates for integer types. Binary extended precision numbers in IEEE BINARY 128 QUADRUPLE PRECISION format (see ) are passed using a VMX register. Binary extended precision numbers in IBM EXTENDED PRECISION format (see ) are passed using two successive floating-point registers. Single-precision decimal floating-point numbers (see ) are passed in the lower half of a floating-point register. Quad-precision decimal floating-point numbers (see ) are passed using a paired even/odd floating-point register pair. A floating-point register might be skipped to allocate an even/odd register pair when necessary. When a floating-point register is skipped, no corresponding memory word is skipped in the natural home location; that is, the corresponding GPR or memory doubleword in the parameter list. All other aggregates are passed in consecutive GPRs, in GPRs and in memory, or in memory. When a parameter is passed in a floating-point or vector register, a number of GPRs are skipped, in allocation order, commensurate to the size of the corresponding in-memory representation of the passed argument's type. The parameter size is always rounded up to the next multiple of a doubleword. Consequently, each parameter of a non-zero size is allocated to at least one doubleword.   Full doubleword rule: When a doubleword in the Parameter Save Area (or its GPR copy) contains at least a portion of a structure, that doubleword must contain all other portions mapping to the same doubleword. (That is, a doubleword can either be completely valid, or completely invalid, but not partially valid and invalid, except in the last doubleword where invalid padding may be present.)   IEEE BINARY 128 QUADRUPLE PRECISION Up to 12 quad-precision parameters can be passed in v2–v13. For the purpose of determining qualified floating-point and vector arguments, an IEEE 128b type shall be considered a "like" vector type, and a complex _Float128 shall be treated as two individual scalar elements. IBM EXTENDED PRECISION IBM EXTENDED PRECISION format parameters are passed as if they were a struct consisting of separate double parameters. IBM EXTENDED PRECISION format parameters shall be considered as a distinct type for the determination of homogeneous aggregates. If fewer arguments are needed, the unused registers defined previously will contain undefined values on entry to the called function. If there are more arguments than registers or no function prototype is provided, a function must provide space for all arguments in its stack frame. When this happens, only the minimum storage needed to contain all arguments (including allocating space for parameters passed in registers) needs to be allocated in the stack frame. General-purpose registers r3–r10 correspond to the allocation of parameters to the first 8 doublewords of the Parameter Save Areah. Specifically, this requires a suitable number of general-purpose registers to be skipped to correspond to parameters passed in floating-point and vector registers. If a parameter corresponds to an unnamed parameter that corresponds to the ellipsis, a caller shall promote float values to double. If a parameter corresponds to an unnamed parameter that corresponds to the ellipsis, the parameter shall be passed in a GPR or in the Parameter Save Area. If no function prototype is available, the caller shall promote float values to double and pass floating-point parameters in both available floating-point registers and in the Parameter Save Area. If no function prototype is available, the caller shall pass vector parameters in both available vector registers and in the Parameter Save Area. (If the callee expects a float parameter, the result will be incorrect.) It is the callee's responsibility to allocate storage for the stored data in the local variable area. When the callee's parameter list indicates that the caller must allocate the Parameter Save Area (because at least one parameter must be passed in memory or an ellipsis is present in the prototype), the callee may use the preallocated Parameter Save Area to save incoming parameters.
Parameter Passing Register Selection Algorithm The following algorithm describes where arguments are passed for the C language. In this algorithm, arguments are assumed to be ordered from left (first argument) to right. The actual order of evaluation for arguments is unspecified. gr contains the number of the next available general-purpose register. fr contains the number of the next available floating-point register. vr contains the number of the next available vector register. The following types refer to the type of the argument as declared by the function prototype. The argument values are converted (if necessary) to the types of the prototype arguments before passing them to the called function. If a prototype is not present, or it is a variable argument prototype and the argument is after the ellipsis, the type refers to the type of the data objects being passed to the called function. INITIALIZE: If the function return type requires a storage buffer, set gr = 4; else set gr = 3. Set fr = 1 Set vr = 2 SCAN: If there are no more arguments, terminate. Otherwise, allocate as follows based on the class of the function argument: switch(class(argument)) unnamed parameter: if gr > 10 goto mem_argument size = size_in_DW(argument) reg_size = min(size, 11 – gr) pass (GPR, gr, first_n_DW (argument, reg_size)); if remaining_members argument = after_n_DW(argument,reg_size)) goto mem_argument break; integer: // up to 64b pointer: // this also includes all pass by reference values if gr > 10 goto mem_argument pass (GPR, gr, argument); gr++ break; aggregate: if (homogeneous(argument,float) and regs_needed(members(argument)) <=8) if (register_type_used (type (argument)) == vr) goto use_vrs; n_fregs = n_fregs_for_type(member_type(argument,0)) agg_size = members(argument) * n_fregs reg_size = min(agg_size, 15 – fr) pass(FPR,fr,first_n_DW(argument,reg_size) fr += reg_size; gr += size_in_DW (first_n_DW(argument,reg_size)) if remaining_members argument = after_n_DW(argument,reg_size)) goto gpr_struct break; if (homogeneous(argument,vector) and members(argument) <= 8) use_vrs: agg_size = members(argument) reg_size = min(agg_size, 14 – vr) if (gr&1 = 0) // align vector in memory gr++ pass(VR,vr,first_n_elements(argument,reg_size); vr += reg_size gr += size_in_DW (first_n_elements(argument,reg_size) if remaining_members argument = after_n_elements(argument,reg_size)) goto gpr_struct break; if gr > 10 goto mem_argument size = size_in_DW(argument) gpr_struct: reg_size = min(size, 11 – gr) pass (GPR, gr, first_n_DW (argument, reg_size)); gr += size_in_DW (first_n_DW (argument, reg_size)) if remaining_members argument = after_n_DW(argument,reg_size)) goto mem_argument break; float: // float is passed in one FPR. // double is passed in one FPR. // IBM EXTENDED PRECISION is passed in the next two FPRs. // IEEE BINARY 128 QUADRUPLE PRECISION is passed in one VR. // _Decimal32 is passed in the lower half of one FPR. // _Decimal64 is passed in one FPR. // _Decimal128 is passed in an even-odd FPR pair, skipping an FPR if necessary. if (register_type_used (type (argument)) == vr) // Assumes == vr is true for IEEE BINARY 128 QUADRUPLE PRECISION. goto use_vr; fr += align_pad(fr,type(argument)) // Assumes align_pad = 8 for _Decimal128 if fr is odd; otherwise = 0. if fr > 14 goto mem_argument n_fregs = n_fregs_for_type(argument) // Assumes n_fregs_for_type == 2 for IBM EXTENDED PRECISION // or _Decimal128, == 1 for float, double, _Decimal32 or _Decimal64. pass(FPR,fr,argument) fr += n_fregs gr += size_in_DW(argument) break; vector: Use vr: if vr > 13 goto mem_argument if (gr&1 = 0) // align vector in memory gr++ pass(VR,vr,argument) vr ++ gr += 2 break; next argument; mem_argument: need_save_area = TRUE pass (stack, gr, argument) gr += size_in_DW(argument) next argument; All complex data types are handled as if two scalar values of the base type were passed as separate parameters. If the callee takes the address of any of its parameters, values passed in registers are stored to memory. It is the callee's responsibility to allocate storage for the stored data in the local variable area. When the callee's parameter list indicates that the caller must allocate the Parameter Save Area (because at least one parameter must be passed in memory, or an ellipsis is present in the prototype), the callee may use the preallocated Parameter Save Area to save incoming parameters. (If an ellipsis is present, using the preallocated Parameter Save Area ensures that all arguments are contiguous.) If the compilation unit for the caller contains a function prototype, but the callee has a mismatching definition, this may result in the wrong values being stored. If the declaration of a function that is used by the caller does not match the definition for the called function, corruption of the caller's stack space can occur.
Parameter Passing Examples This section provides some examples that use the algorithm described in . shows how parameters are passed for a function that passes arguments in GPRs, FPRs, and memory.
Passing Arguments in GPRs, FPRs, and Memory typedef struct { int a; double dd; } sparm; sparm s, t; int c, d, e; long double ld;/* IBM EXTENDED PRECISION format */ double ff, gg, hh; x = func(c, ff, d, ld, s, gg, t, e, hh); Parameter Register Offset in parameter save area c r3 0–7 (not stored in parameter save area) ff f1 8–15 (not stored) d r5 16–23 (not stored) ld f2,f3 24–39 (not stored) s r8,r9 40–55 (not stored) gg f4 56–63 (not stored) t (none) 64–79 (stored in parameter save area) e (none) 80–87 (stored) hh f5 88–95 (not stored)
If a prototype is not in scope: The floating-point argument ff is also passed in r4. The long double argument ld is also passed in r6 and r7. The floating-point argument gg is also passed in r10. The floating-point argument hh is also stored into the Parameter Save Area. If a prototype containing an ellipsis describes any of these floating-point arguments as being part of the variable argument part, the general registers and Parameter Save Area are used as when no prototype is in scope. The floating-point registers are not used. shows the definitions that are used in the remaining examples of parameter passing.
Parameter Passing Definitions typedef struct { double a double b; } dpfp2; typedef struct float a float b; } spfp2; double a1,a4; dpfp2 a2,a3 ; spfp a6,a7; double func2 (double a, dpfp2 p1, dpfp p2, double b, int x); double func3 (double a, dpfp2 p1, dpfp p2, double b, int x, spfp2 p3,spfpp4); struct three_floats { float a,b,c;} struct two_floats { float a,b;}
shows how parameters are passed for a function that passes homogenous floating-point aggregates and integer parameters in registers without allocating a Parameter Save Area because all the parameters can be contained in the registers.
Passing Homogeneous Floating-Point Aggregates and Integer Parameters in Registers without a Parameter Save Area x = func2(a1,a2,a3,a4, 5); Parameter Register Offset in parameter save area a1 f1 n/a a2.a f2 n/a a2.b f3 n/a a3.a f4 n/a a3.b f5 n/a a4 f6 n/a 5 r9 n/a
shows how parameters are passed for a function that passes homogenous floating-point aggregates and integer parameters in registers without allocating a Parameter Save Area because all parameters can be passed in registers.
Passing Homogeneous Floating-Point Aggregates and Integer Parameters in Registers without a Parameter Save Area x = func3(a1,a2,a3,a4, 5,a6,a7); Parameter Register Offset in parameter save area a1 f1 n/a a2.a f2 n/a a2.b f3 n/a a3.a f4 n/a a3.b f5 n/a a4 f6 n/a 5 r9 n/a a6.a f7 n/a a6.b f8 n/a a7.a f9 n/a a7.b f10 n/a
shows how parameters are passed for a function that passes floating-point scalars and homogeneous floating-point aggregates in registers and memory because the number of available parameter registers has been exceeded. It demonstrate the full doubleword rule.
Passing Floating-Point Scalars and Homogeneous Floating-Point Aggregates in Registers and Memory x = oddity (float d1, float d2, float d3, float d4, float d5, float d6, float d7, float d8, float d9, float d10, float d11, float d12, struct three_floats x) Parameter Register Offset in parameter save area d1 f1 0 (not stored) d2 f2 8 (not stored) d3 f3 16 (not stored) d4 f4 24 (not stored) d5 f5 32 (not stored) d6 f6 40 (not stored) d7 f7 48 (not stored) d8 f8 56 (not stored) d9 f9 64 (not stored) d10 f10 72 (not stored) d11 f11 80 (not stored) d12 f12 88 (not stored) x.a f13 96 (store because of no partial DW rule) x.b - 100 (stored) x.c - 104 (stored)
shows how parameters are passed for a function that passes homogeneous floating-point aggregates and floating-point scalars in general-purpose registers because the number of available floating-point registers has been exceeded. In this figure, a Parameter Save Area is not allocated because all the parameters can be passed in registers.
Passing Floating-Point Scalars and Homogeneous Floating-Point Aggregates in FPRs and GPRs without a Parameter Save Area x = oddity2 (struct two_floats s1, struct two_floats s2, struct two_floats s3, struct two_floats s4, struct two_floats s5, struct two_floats s6, struct two_floats s7, struct two_floats s8) Parameter Register Offset in parameter save area s1.a f1 n/a s1.b f2 n/a s2.a f3 n/a s2.b f4 n/a s3.a f5 n/a s3.b f6 n/a s4.a f7 n/a s4.b f8 n/a s5.a f9 n/a s5.b f10 n/a s6.a f11 n/a s6.b f12 n/a s7.a f13 n/a s7.b - n/a s7 gpr9 n/a s8 gpr10 n/a
shows how parameters are passed for a function that passes homogeneous floating-point aggregates in FPRs, GPRs, and memory because the number of available floating-point and integer parameter registers has been exceeded. In this figure, a Parameter Save Area is allocated because all the parameters cannot be passed in the registers. This figure also demonstrates the full doubleword rule applied to GPR7.
Passing Homogeneous Floating-Point Aggregates in FPRs, GPRs, and Memory with a Parameter Save Area x = oddity3 (struct two_floats s1, struct two_floats s2, struct two_floats s3, struct two_floats s4, struct two_floats s5, struct two_floats s6, struct two_floats s7, struct two_floats s8, struct two_floats s9) Parameter Register Offset in parameter save area s1.a f1 0 (not stored) s1.b f2 4 (not stored) s2.a f3 8 (not stored) s2.b f4 12 (not stored) s3.a f5 16 (not stored) s3.b f6 20 (not stored) s4.a f7 24 (not stored) s4.b f8 28 (not stored) s5.a f9 32 (not stored) s5.b f10 36 (not stored) s6.a f11 40 (not stored) s6.b f12 44 (not stored) s7.a f13 48 (not stored, SPFP in FPR) s7.b - 52 (not stored) s7 gpr9 48 (not stored, full gpr) s8 gpr10 56 (not stored, full gpr) s9 64 (stored)
shows how parameters are passed for a function that passes vector data types in VRs, GPRs, and FPRs. In this figure, a Parameter Save Area is not allocated.
Passing Vector Data Types without Parameter Save Area x =func4(int s1, vector float s2, float s3, vector int s4, vector char s5) Parameter Register Offset in parameter save area s1 gpr3 n/a s2 v2 n/a s3 f1 n/a s4 v3 n/a s5 v4 n/a
shows how parameters are passed for a function that passes vector data types in VRs, GPRs, and FPRs. In this figure, a Parameter Save Area is allocated.
Passing Vector Data Types with a Parameter Save Area x =func5(int s1, vector float s2, float s3, vector int s4, int s5, char s6) Parameter Register Offset in parameter save area s1 gpr3 0 (not stored) s2 v2 16 (not stored) s3 f1 32 (not stored) s4 v3 48 (not stored) s5 - 64 (stored) s6 - 72 (stored)
When a function takes the address of at least one of its arguments, it is the callee's responsibility to store function parameters in memory and provide a suitable memory address for parameters passed in registers. For functions where all parameters can be contained in the parameter registers and without an ellipsis, the caller shall allocate saved parameters in the local variable save area because the caller may not have allocated a Parameter Save Area. This can be performed, for example, in the prologue. For functions where the caller must allocate a Parameter Save Area because at least one parameter must be passed in memory, or has an ellipsis in the prototype to indicate the presence of a variadic function, references to named parameters may be spilled to the Parameter Save Area.
Variable Argument Lists C programs that are intended to be portable across different compilers and architectures must use the header file <stdarg.h> to deal with variable argument lists. This header file contains a set of macro definitions that define how to step through an argument list. The implementation of this header file may vary across different architectures, but the interface is the same. C programs that do not use this header file for the variable argument list and assume that all the arguments are passed on the stack in increasing order on the stack are not portable, especially on architectures that pass some of the arguments in registers. The Power Architecture is one of the architectures that passes some of the arguments in registers. The parameter list may be zero length and is only allocated when parameters are spilled, when a function has unnamed parameters, or when no prototype is provided. When the Parameter Save Area is allocated, the Parameter Save Area must be large enough to accommodate all parameters, including parameters passed in registers.
Return Values Functions that return a value shall place the result in the same registers as if the return value was the first named input argument to a function unless the return value is a nonhomogeneous aggregate larger than 2 doublewords or a homogeneous aggregate with more than eight registers. For a definition of homogeneous aggregates, see . (Homogeneous aggregates are arrays, structs, or unions of a homogeneous floating-point or vector type and of a known fixed size.) Therefore, IBM EXTENDED PRECISION functions are returned in f1:f2. Homogeneous floating-point or vector aggregate return values that consist of up to eight registers with up to eight elements will be returned in floating-point or vector registers that correspond to the parameter registers that would be used if the return value type were the first input parameter to a function. Aggregates that are not returned by value are returned in a storage buffer provided by the caller. The address is provided as a hidden first input argument in general-purpose register r3. Quadword decimal floating-point return values shall be returned in the first paired floating-point register parameter pair; that is, f2:f3. Functions that return values of the following types shall place the result in register r3 as signed or unsigned integers, as appropriate, and sign extended or zero extended to 64 bits where necessary: char enum short int long pointer to any type _Bool
Coding Examples The following ISO C coding examples are provided as illustrations of how operations may be done, not how they shall be done, for calling functions, accessing static data, and transferring control from one part of a program to another. They are shown as code fragments with simplifications to explain addressing modes. They do not necessarily show the optimal code sequences or compiler output. The small data area is not used in any of them. For more information, see . The previous sections explicitly specify what a program, operating system, and processor may and may not assume and are the definitive reference to be used. In these examples, absolute code and position-independent code are referenced. When instructions hold absolute addresses, a program must be loaded at a specific virtual address to permit the absolute code model to work. When instructions hold relative addresses, a program library can be loaded at various positions in virtual memory and is referred to as a position-independent code model. When generating code for PowerISA version 3.1 or above, this specification provides two ways to address non-local data and text. The historical method relies on a dedicated table-of-contents (TOC) pointer to obtain such addresses. PowerISA version 3.1 introduces new "PC-relative" instructions that can be used to obtain such addresses relative to the current instruction address (CIA). Both methods may be used in the same executable, dynamically shared object (DSO), object file, or even in the same function. If a function does not require a TOC pointer for addressing, it is not required to establish this pointer in register r2, and may choose not to preserve register r2's value provided that the function's symbol table entry is appropriately annotated. Full details of function call linkage requirements are provided in .
Code Model Overview Executable modules can be built to use either position-dependent or position-independent memory references. Position-dependent references generally result in better performing programs. Static modules representing the base executables and libraries intended to be statically linked into a base executable can be compiled and linked using either position-dependent or position-independent code. Dynamic shared objects (DSOs) intended to be used as shared libraries and position-independent executables must be compiled and linked as position-independent code.
Position-Dependent Code Static objects are preferably built by using position-dependent code. Position-dependent code can reference data in one of the following ways: Directly by creating absolute memory addresses using a combination of instructions such as lis, addi, and memory instructions: lis r16, symbol@ha ld r12, symbol@l(r16) lis r16, symbol2@ha addi r16, r16, symbol2@l lvx v1, r0, r16 By instantiating the TOC pointer in r2 and using TOC-pointer relative addressing. (For more information, see .) <load TOC base to r2> ld r12, symbol@toc(r2) li r16, symbol2@toc lvx v1, r2, r16 By instantiating the TOC pointer in r2 and using GOT-indirect addressing: <load TOC base to r2> ld r12, symbol@got(r2) ld r12, 0(r12) ld r12, symbol2@got(r2) lvx v1, 0, r12 By using PC-relative addressing. pld r12, symbol@pcrel plxv v1, symbol@pcrel In the OpenPOWER ELF V2 ABI, position-dependent code built with this addressing scheme may have a Global Offset Table (GOT) in the data segment that holds addresses. (For more information, see .) For position-dependent code, GOT entries are typically updated to reflect the absolute virtual addresses of the reference objects at static link time. Any remaining GOT entries are updated by the loader to reflect the absolute virtual addresses that were assigned for the process. These data segments are private, while the text segments are shared. In systems based on the Power Architecture, the GOT can be addressed with a single instruction if the GOT size is less than 65,536 bytes. A larger GOT requires more general code to access all of its entries. OpenPOWER-compliant processor hardware implementation and linker optimizations described here work together to optimize efficient code generation for applications with large GOTs. They use instruction fusion to combine multiple ISA instructions into a single internal operation. Offsets from the TOC register can be generated using either: 16-bit offsets (small code model), with a maximum addressing reach of 64 KB for TOC-based relative addressing or GOT accesses 32-bit offsets (medium or large code model) with a maximum addressing reach of 4 GB Efficient implementation of the OpenPOWER ELF V2 ABI medium code model is supported by additional optimizations present in OpenPOWER-compliant processor implementations and the OpenPOWER ABI toolchain (see ). Position-dependent code is most efficient if the application is loaded in the first 2 GB of the address space because direct address references and TOC-pointer initializations can be performed using a two-instruction sequence. PC-relative offsets are usually 34 bits for all code models, with a maximum addressing reach of 16GB. The effective addressing reach for global data is 8GB, since data sections are always located at higher virtual addresses than text sections.
Position-Independent Code A shared object file is mapped with virtual addresses to avoid conflicts with other segments in the process. Because of this mapping, shared objects use position-independent code, which means that the instructions do not contain any absolute addresses. Avoiding the use of absolute addresses allows shared objects to be loaded into different virtual address spaces without code modification, which can allow multiple processes to share the same text segment for a shared object file. Two techniques are used to deal with position-independent code: First, branch instructions use an offset to the current effective address (EA) or use registers to hold addresses. The Power Architecture provides both EA-relative branch instructions and branch instructions that use registers. In both cases, absolute addressing is not required. Second, when absolute addressing is required, the value can be computed with a Global Offset Table (GOT), which holds the information for address computation. Static and const references can be accessed using a TOC pointer relative addressing model, while (shared) extern references must be accessed using the GOT-indirect addressing scheme. Both addressing schemes require a TOC pointer to be initialized. DSOs can access data as follows: By instantiating the TOC pointer in r2 and using TOC pointer relative addressing (for private data). <load TOC base to r2> ld r12, symbol@toc(r2) li r16, symbol2@toc lvx v1, r2, r16 By instantiating the TOC pointer in r2 and using GOT-indirect addressing (for shared data or for very large data sections): <load TOC base to r2> ld r12, symbol@got(r2) ld r12, 0(r12) ld r12 symbol2@got(r2) lvx v1, 0, r12 By using PC-relative addressing (for private data). pld r12, symbol@pcrel plxv v1, symbol@pcrel By using PC-relative GOT-indirect addressing (for shared data): pld r12, symbol@got@pcrel ld r12, 0(r12) pld r12, symbol@got@pcrel lvx v1, 0, r12 A compiler may generate a PC-relative addressing sequence to access static or restricted-visibility data, but must generate a PC-relative GOT-indirect sequence for extern data. Extern data may be satisfied from a statically or dynamically linked source, so the compiler must be conservative. The compiler and linker can cooperate to replace a PC-relative GOT-indirect sequence with a PC-relative sequence when the data reference is satisfied at static link time. See . Position-independent executables or shared objects have a GOT in the data segment that holds addresses. When the system creates a memory image from the file, the GOT entries are updated to reflect the absolute virtual addresses that were assigned for the process. These data segments are private, while the text segments are shared. In systems based on the Power Architecture, the GOT can be addressed with a single instruction if the GOT size is less than 65,536 bytes. A larger GOT requires more general code to access all of its entries. The OpenPOWER-compliant processor hardware implementation and linker optimizations described here work together to optimize efficient code generation for applications with large GOTs. They use instruction fusion to combine multiple ISA instructions into a single internal operation.
Code Models Compilers may provide different code models depending on the expected size of the TOC and the size of the entire executable or shared library. Small code model: The TOC is accessed using 16-bit offsets from the TOC pointer. This limits the size of a single TOC to 64 KB. Position-independent code uses GOT-indirect addressing to access other objects in the binary. Large code model: The TOC is accessed using 32-bit offsets from the TOC pointer, except for .sdata and .sbss, which are accessed using 16-bit offsets from the TOC pointer. This allows a TOC of at least 2 GB. Position-independent code uses GOT-indirect addressing to access other objects in the binary. Medium code model: Like the large code model, the TOC is accessed using 32-bit offsets from the TOC pointer, except for .sdata and .sbss, which are accessed using 16-bit offsets. In addition, accesses to module-local code and data objects use TOC pointer relative addressing with 32-bit offsets. Using TOC pointer relative addressing removes a level of indirection, resulting in faster access and a smaller GOT. However, it limits the size of the entire binary to between 2 GB and 4 GB, depending on the placement of the TOC base. The medium code model is the default for compilers, and it is applicable to most programs and libraries. The code examples in this document generally use the medium code model. When linking medium and large code model relocatable objects, the linker should place the .sdata and .sbss sections near to the TOC base. A linker must allow linking of relocatable object files using different code models. This may be accomplished by sorting the constituent sections of the TOC so that sections that are accessed using 16-bit offsets are placed near to the TOC base, by using multiple TOCs, or by some other method. The suggested allocation order of sections is provided in . PC-relative addressing may be used with the medium code model. Accesses to module-local code and data objects use PC-relative addressing with up to 34-bit offsets. Position-independent code uses PC-relative GOT-indirect addressing to access shared objects.
Function Prologue and Epilogue A function's prologue and epilogue are described in this section.
Function Prologue A function's prologue establishes addressability by initializing a TOC pointer in register r2, if necessary, and a stack frame, if necessary, and may save any nonvolatile registers it uses. Not all functions must initialize a TOC pointer, and not all functions must preserve the existing value of r2. See for more information. All functions have a global entry point (GEP) available to any caller and pointing to the beginning of the prologue. Some functions may have a secondary entry point to optimize the cost of TOC pointer management. In particular, functions within a common module sharing the same TOC base value in r2 may be entered using a secondary entry point (the local entry point or LEP) that may bypass the code that loads a suitable TOC pointer value into the r2 register. When a dynamic or global linker transfers control from a function to another function in the same module, it may choose (but is not required) to use the local entry point when the r2 register is known to hold a valid TOC base value. Function pointers shared between modules shall always use the global entry point to specify the address of a function. When a linker causes control to transfer to a global entry point of a function that also has a local entry point, it must insert a glue code sequence that loads r12 with the global entry-point address. Code at the global entry point of a function that also has a local entry point can assume that register r12 points to the GEP. However, code at the global entry point of a function that does not have a separate local entry point cannot make any assumptions about the values of either r2 or 12. Addresses between the global and local entry points must not be branch targets, either for function entry or referenced by program logic of the function, because a linker may rewrite the code sequence establishing addressability to a different, more optimized form. For example, while linking a static module with a known load address in the first 2 GB of the address space, the following code sequence may be rewritten: addis r2, r12, .TOC.-func@ha addi r2, r2, .TOC.-func@l It may be rewritten by a linker or assembler to an equivalent form that is faster due to instruction fusion, such as: lis r2, .TOC.@ha addi r2, r2, .TOC.@l In addition to establishing addressability, the function prologue is responsible for the following functions: Creating a stack frame when required Saving any nonvolatile registers that are used by the function Saving any limited-access bits that are used by the function, per the rules described in This ABI shall be used in conjunction with the Power Architecture that implements the mfocrf architecture level. Further, OpenPOWER-compliant processors shall implement implementation-defined bits in a manner to allow the combination of multiple mfocrf results with an OR instruction; for example, to yield a word in r0 including all three preserved CRs as follows: mfocrf r0, crf2 mfocrf r1, crf3 or r0, r0, r1 mfocrf r1, crf4 or r0, r0, r1 Specifically, this allows each OpenPOWER-compliant processor implementation to set each field to hold either 0 or the correct in-order value of the corresponding CR field at the point where the mfocrf instruction is performed.   Assembly Language Syntax for Defining Entry Points When a function has two entry points, the global entry point is defined as a symbol. The local entry point is defined with the .localentry assembler pseudo op. my_func: addis r2, r12, (.TOC.-my_func)@ha addi r2, r2, (.TOC.-my_func)@l .localentry my_func, .-my_func ... ; function definition blr shows how to represent dual entry points in symbol tables in an ELF object file. It also defines the meaning of the second parameter, which is put in the three most-significant bits of the st_other field in the ELF Symbol Table entry.
Function Epilogue The purpose of the epilogue is to perform the following functions: Restore all registers and limited-access bits that we saved by the function's prologue. Restore the last stack frame. Return to the caller.
Rules for Prologue and Epilogue Sequences Set function prologue and function epilogue code sequences are not imposed by this ABI. There are several rules that must be adhered to in order to ensure reliable and consistent call chain backtracing: Before a function calls any other function, it shall establish its own stack frame, whose size shall be a multiple of 16 bytes. In instances where a function's prologue creates a stack frame, the back-chain word of the stack frame shall be updated atomically with the value of the stack pointer (r1) when a back chain is implemented. (This must be supported as default by all ELF V2 ABI-compliant environments.) This task can be done by using one of the following Store Doubleword with Update instructions: Store Doubleword with Update instruction with relevant negative displacement for stack frames that are smaller than 32 KB Store Doubleword with Update Indexed instruction where the negative size of the stack frame has been computed, using addis and addi or ori instructions, and then loaded into a volatile register, for stack frames that are 32 KB or greater The function shall save the link register that contains its return address in the LR save doubleword of its caller's stack frame before calling another function. The deallocation of a function's stack frame must be an atomic operation. This task can be accomplished by one of the following methods: Increment the stack pointer by the identical value that it was originally decremented by in the prologue when the stack frame was created. Load the stack pointer (r1) with the value in the back-chain word in the stack frame, if a back chain is present. The calling sequence does not restrict how languages leverage the Local Variable Space of the stack frame. There is no restriction on the size of this section. The Parameter Save Area shall be allocated by the caller. It shall be large enough to contain the parameters needed by the caller if a Parameter Save Area is needed (as described in ). Its contents are not saved across function calls. If any nonvolatile registers are to be used by the function, the contents of the register must be saved into a register save area. See for information on all of the optional register save areas. Saving or restoring nonvolatile registers used by the function can be accomplished by using in-line code. Alternately, one of the system subroutines described in may offer a more efficient alternative to in-line code, especially in cases where there are many registers to be saved or restored.
Register Save and Restore Functions This section describes functions that can be used to save and restore the contents of nonvolatile registers. Using these routines, rather than performing these saves and restores inline in the prologue and epilogue of functions, can help reduce the code footprint. The calling conventions of these functions are not standard, and the executables or shared objects that use these functions must statically link them. The register save and restore functions affect consecutive registers from register N through register 31, where N represents a number between 14 and 31. Higher-numbered registers are saved at higher addresses within a save area. Each function described in this section is a family of functions with identical behavior except for the number and kind of registers affected. Systems must provide three pairs of functions to save and restore general-purpose, floating-point, and vector registers. They may be implemented as multiple-entry-point routines or as individual routines. The specific calling conventions for each of these functions are described in , , and . Visibility rules are described in .
GPR Save and Restore Functions Each _savegpr0_N routine saves the general registers from rN–r31, inclusive. Each routine also saves the LR. The stack frame must not have been allocated yet. When the routine is called, r1 contains the address of the word immediately beyond the end of the general register save area, and r0 must contain the value of the LR on function entry. The _restgpr0_N routines restore the general registers from rN–r31, and then return to their caller's caller. The caller's stack frame must already have been deallocated. When the routine is called, r1 contains the address of the word immediately beyond the end of the general register save area, and the LR must contain the return address. A sample implementation of _savegpr0_N and _restgpr0_N follows: _savegpr0_14: std r14,-144(r1) _savegpr0_15: std r15,-136(r1) _savegpr0_16: std r16,-128(r1) _savegpr0_17: std r17,-120(r1) _savegpr0_18: std r18,-112(r1) _savegpr0_19: std r19,-104(r1) _savegpr0_20: std r20,-96(r1) _savegpr0_21: std r21,-88(r1) _savegpr0_22: std r22,-80(r1) _savegpr0_23: std r23,-72(r1) _savegpr0_24: std r24,-64(r1) _savegpr0_25: std r25,-56(r1) _savegpr0_26: std r26,-48(r1) _savegpr0_27: std r27,-40(r1) _savegpr0_28: std r28,-32(r1) _savegpr0_29: std r29,-24(r1) _savegpr0_30: std r30,-16(r1) _savegpr0_31: std r31,-8(r1) std r0, 16(r1) blr _restgpr0_14: ld r14,-144(r1) _restgpr0_15: ld r15,-136(r1) _restgpr0_16: ld r16,-128(r1) _restgpr0_17: ld r17,-120(r1) _restgpr0_18: ld r18,-112(r1) _restgpr0_19: ld r19,-104(r1) _restgpr0_20: ld r20,-96(r1) _restgpr0_21: ld r21,-88(r1) _restgpr0_22: ld r22,-80(r1) _restgpr0_23: ld r23,-72(r1) _restgpr0_24: ld r24,-64(r1) _restgpr0_25: ld r25,-56(r1) _restgpr0_26: ld r26,-48(r1) _restgpr0_27: ld r27,-40(r1) _restgpr0_28: ld r28,-32(r1) _restgpr0_29: ld r0, 16(r1) ld r29,-24(r1) mtlr r0 ld r30,-16(r1) ld r31,-8(r1) blr _restgpr0_30: ld r30,-16(r1) _restgpr0_31: ld r0, 16(r1) ld r31,-8(r1) mtlr r0 blr Each _savegpr1_N routine saves the general registers from rN–r31, inclusive. When the routine is called, r12 contains the address of the word just beyond the end of the general register save area. The _restgpr1_N routines restore the general registers from rN–r31. When the routine is called, r12 contains the address of the word just beyond the end of the general register save area, superseding the normal use of r12 on a call. A sample implementation of _savegpr1_N and _restgpr1_N follows: _savegpr1_14: std r14,-144(r12) _savegpr1_15: std r15,-136(r12) _savegpr1_16: std r16,-128(r12) _savegpr1_17: std r17,-120(r12) _savegpr1_18: std r18,-112(r12) _savegpr1_19: std r19,-104(r12) _savegpr1_20: std r20,-96(r12) _savegpr1_21: std r21,-88(r12) _savegpr1_22: std r22,-80(r12) _savegpr1_23: std r23,-72(r12) _savegpr1_24: std r24,-64(r12) _savegpr1_25: std r25,-56(r12) _savegpr1_26: std r26,-48(r12) _savegpr1_27: std r27,-40(r12) _savegpr1_28: std r28,-32(r12) _savegpr1_29: std r29,-24(r12) _savegpr1_30: std r30,-16(r12) _savegpr1_31: std r31,-8(r12) blr _restgpr1_14: ld r14,-144(r12) _restgpr1_15: ld r15,-136(r12) _restgpr1_16: ld r16,-128(r12) _restgpr1_17: ld r17,-120(r12) _restgpr1_18: ld r18,-112(r12) _restgpr1_19: ld r19,-104(r12) _restgpr1_20: ld r20,-96(r12) _restgpr1_21: ld r21,-88(r12) _restgpr1_22: ld r22,-80(r12) _restgpr1_23: ld r23,-72(r12) _restgpr1_24: ld r24,-64(r12) _restgpr1_25: ld r25,-56(r12) _restgpr1_26: ld r26,-48(r12) _restgpr1_27: ld r27,-40(r12) _restgpr1_28: ld r28,-32(r12) _restgpr1_29: ld r29,-24(r12) _restgpr1_30: ld r30,-16(r12) _restgpr1_31: ld r31,-8(r12) blr
FPR Save and Restore Functions Each _savefpr_N routine saves the floating-point registers from fN–f31, inclusive. When the routine is called, r1 contains the address of the word immediately beyond the end of the Floating-Point Register Save Area, which means that the stack frame must not have been allocated yet. Register r0 must contain the value of the LR on function entry. The _restfpr_N routines restore the floating-point registers from fN–f31, inclusive. When the routine is called, r1 contains the address of the word immediately beyond the end of the Floating-Point Register Save Area, which means that the stack frame must not have been allocated yet. It is incorrect to call both _savefpr_M and _savegpr0_M in the same prologue, or _restfpr_M and _restgpr0_M in the same epilogue. It is correct to call _savegpr1_M and _savefpr_M in either order, and to call _restgpr1_M and then _restfpr_M. A sample implementation of _savefpr_N and _restfpr_N follows: _savefpr_14: stfd f14,-144(r1) _savefpr_15: stfd f15,-136(r1) _savefpr_16: stfd f16,-128(r1) _savefpr_17: stfd f17,-120(r1) _savefpr_18: stfd f18,-112(r1) _savefpr_19: stfd f19,-104(r1) _savefpr_20: stfd f20,-96(r1) _savefpr_21: stfd f21,-88(r1) _savefpr_22: stfd f22,-80(r1) _savefpr_23: stfd f23,-72(r1) _savefpr_24: stfd f24,-64(r1) _savefpr_25: stfd f25,-56(r1) _savefpr_26: stfd f26,-48(r1) _savefpr_27: stfd f27,-40(r1) _savefpr_28: stfd f28,-32(r1) _savefpr_29: stfd f29,-24(r1) _savefpr_30: stfd f30,-16(r1) _savefpr_31: stfd f31,-8(r1) std r0, 16(r1) blr _restfpr_14: lfd f14,-144(r1) _restfpr_15: lfd f15,-136(r1) _restfpr_16: lfd f16,-128(r1) _restfpr_17: lfd f17,-120(r1) _restfpr_18: lfd f18,-112(r1) _restfpr_19: lfd f19,-104(r1) _restfpr_20: lfd f20,-96(r1) _restfpr_21: lfd f21,-88(r1) _restfpr_22: lfd f22,-80(r1) _restfpr_23: lfd f23,-72(r1) _restfpr_24: lfd f24,-64(r1) _restfpr_25: lfd f25,-56(r1) _restfpr_26: lfd f26,-48(r1) _restfpr_27: lfd f27,-40(r1) _restfpr_28: lfd f28,-32(r1) _restfpr_29: ld r0, 16(r1) lfd f29,-24(r1) mtlr r0 lfd f30,-16(r1) lfd f31,-8(r1) blr _restfpr_30: lfd f30,-16(r1) _restfpr_31: ld r0, 16(r1) lfd f31,-8(r1) mtlr r0 blr
Vector Save and Restore Functions Each _savevr_M routine saves the vector registers from vM–v31 inclusive. On entry to this function, r0 contains the address of the word just beyond the end of the Vector Register Save Area. The routines leave r0 undisturbed. They modify the value of r12. The _restvr_M routines restore the vector registers from vM–v31 inclusive. On entry to this function, r0 contains the address of the word just beyond the end of the Vector Register Save Area. The routines leave r0 undisturbed. They modify the value of r12. The following code is an example of restoring a vector register. It is valid to call _savevr_M before any of the other register save functions, or after _savegpr1_M. It is valid to call _restvr_M before any of the other register restore functions, or after _restgpr1_M. A sample implementation of _savevr_M and _restvr_M follows: _savevr_20: addi r12,r0,-192 stvx v20,r12,r0 # save v20 _savevr_21: addi r12,r0,-176 stvx v21,r12,r0 # save v21 _savevr_22: addi r12,r0,-160 stvx v22,r12,r0 # save v22 _savevr_23: addi r12,r0,-144 stvx v23,r12,r0 # save v23 _savevr_24: addi r12,r0,-128 stvx v24,r12,r0 # save v24 _savevr_25: addi r12,r0,-112 stvx v25,r12,r0 # save v25 _savevr_26: addi r12,r0,-96 stvx v26,r12,r0 # save v26 _savevr_27: addi r12,r0,-80 stvx v27,r12,r0 # save v27 _savevr_28: addi r12,r0,-64 stvx v28,r12,r0 # save v28 _savevr_29: addi r12,r0,-48 stvx v29,r12,r0 # save v29 _savevr_30: addi r12,r0,-32 stvx v30,r12,r0 # save v30 _savevr_31: addi r12,r0,-16 stvx v31,r12,r0 # save v31 blr # return to epilogue _restvr_20: addi r12,r0,-192 lvx v20,r12,r0 # restore v20 _restvr_21: addi r12,r0,-176 lvx v21,r12,r0 # restore v21 _restvr_22: addi r12,r0,-160 lvx v22,r12,r0 # restore v22 _restvr_23: addi r12,r0,-144 lvx v23,r12,r0 # restore v23 _restvr_24: addi r12,r0,-128 lvx v24,r12,r0 # restore v24 _restvr_25: addi r12,r0,-112 lvx v25,r12,r0 # restore v25 _restvr_26: addi r12,r0,-96 lvx v26,r12,r0 # restore v26 _restvr_27: addi r12,r0,-80 lvx v27,r12,r0 # restore v27 _restvr_28: addi r12,r0,-64 lvx v28,r12,r0 # restore v28 _restvr_29: addi r12,r0,-48 lvx v29,r12,r0 # restore v29 _restvr_30: addi r12,r0,-32 lvx v30,r12,r0 # restore v30 _restvr_31: addi r12,r0,-16 lvx v31,r12,r0 # restore v31 blr #return to epilogue
Function Pointers A function's address is defined to be its global entry point. Function pointers shall contain the global entry-point address.
Static Data Objects Data objects with static storage duration are described here. Stack-resident data objects are omitted because the virtual addresses of stack-resident data objects are derived relative to the stack or frame pointers. Heap data objects are omitted because they are accessed via a program pointer. The only instructions that can access memory in the Power Architecture are load and store instructions. Programs typically access memory by placing the address of the memory location into a register and accessing the memory location indirectly through the register because Power Architecture instructions cannot hold 64-bit addresses directly. The values of symbols or their absolute virtual addresses are placed directly into instructions for symbolic references in absolute code. shows an example of this method. Examples of absolute and position-independent compilations are shown in , , , and . These examples show the C language statements together with the generated assembly language. The assumption for these figures is that only executables can use absolute addressing while shared objects must use position-independent code addressing. The figures are intended to demonstrate the compilation of each C statement independent of its context; hence, there can be redundant operations in the code. Absolute addressing efficiency depends on the memory-region addresses: Top 32 KB Addressed directly with load and store D forms. Top 2 GB Addressed by a two-instruction sequence consisting of an lis with load and store D forms. Remaining addresses More than two instructions. Bottom 2 GB Addressed by a two-instruction sequence consisting of an lis with load and store D forms. Bottom 32 KB Addressed directly with load and store D forms. Absolute Load and Store Example C Code Assembly Code extern int src; extern int dst; extern int *ptr; dst = src; ptr = &dst; *ptr = src; .extern src .extern dst .extern ptr .section ".text" lis r9,src@ha lwz r9,src@l(r9) lis r11,dst@ha stw r9,dst@l(r11) lis r11,ptr@ha lis r9,dst@ha la r9,dst@l(r9) std r9,ptr@l(r11) lis r11,ptr@ha lwz r11,ptr@l(r11) lis r9,src@ha lwz r9,src@l(r9) stw r9,0(r11)
Small Model Position-Independent Load and Store (DSO) C Code Assembly Code extern int src; extern int dst; extern int *ptr; dst = src; ptr = &dst; *ptr = src; .extern src .extern dst .extern ptr .section ".text" # TOC base in r2 ld r9,src@got(2) lwz r0,0(r9) ld r9,dst@got(r2) stw r0,0(r9) ld r9,ptr@got(r2) ld r0,dst@got(r2) std r0,0(r9) ld r9,ptr@got(r2) ld r11,0(r9) ld r9,src@got(r2) lwz r0,0(r9) stw r0,0(r11)
Medium or Large Model Position-Independent Load and Store (DSO) C Code Assembly Code extern int src; extern int dst; int *ptr; dst = src; ptr = &dst; *ptr = src; .extern src .extern dst .extern ptr .section".text" # AssumesTOC pointer in r2 addis r6,r2,src@got@ha ld r6,src@got@l(r6) addis r7,r2,dst@got@ha ld r7,dst@got@l(r7) lwz r0,0(r6) stw r0,0(r7) addis r6,r2,dst@got@ha ld r6,dst@got@l(r6) addis r7,r2,ptr@got@ha ld r7,ptr@got@l(r7) stw r6,0(r7) addis r6,r2,src@got@ha ld r6,src@got@l(r6) addis r7,r2,ptr@got@ha ld r7,ptr@got@l(r7) ld r7,0(r7) lwz r0,0(r6) stw r0,0,(r7)
PC-Relative Load and Store C Code Assembly Code extern int src; extern int dst; int *ptr; dst = src; ptr = &dst; *ptr = src; .extern src .extern dst .extern ptr .section ".text" plwz r9, src@pcrel pstw r9, dst@pcrel paddi r11, dst@pcrel pstd r11, ptr@pcrel pld r11, ptr@pcrel plwz r9, src@pcrel stw r9, 0(r11)
Due to fusion hardware support, the preferred code forms are destructive Destructive in this context refers to a code sequence where the first intermediate result computed by a first instruction is overwritten (that is, "destroyed") by the result of a second instruction so that only one result register is produced. Fusion can then give the same performance as a single load instruction with a 32-bit displacement. addressing forms with an addis specifying a set of high-order bits followed immediately by a destructive load using the same target register as the addis instruction to load data from a signed 32-bit offset from a base register. For TOC-based PIC code (see and ), the offset in the Global Offset Table where the value of the symbol is stored is given by the assembly syntax symbol@got. This syntax represents the address of the variable named "symbol." The offset for this assembly syntax cannot be any larger than 16 bits. In cases where the offset is greater than 16 bits, the following assembly syntax is used for offsets up to 32 bits: High (32-bit) adjusted part of the offset: symbol@got@ha Causes a linker error if the offset is larger than 32 bits. High (32-bit) part of the offset: symbol@got@h Causes a linker error if the offset is larger than 32 bits. Low part of the offset: symbol@got@l To obtain the multiple 16-bit segments of a 64-bit offset, the following operators may be used: Highest (most-significant 16 bits) adjusted part of the offset: symbol@highesta Highest (most-significant 16 bits) part of the offset: symbol@highest Higher (next significant 16 bits) adjusted part of the offset: symbol@highera Higher (next significant 16 bits) part of the offset: symbol@higher High (next significant 16 bits) adjusted part of the offset: symbol@higha High (next significant 16 bits) part of the offset: symbol@high Low part of the offset: symbol@l If the instruction using symbol@got@l has a signed immediate operand (for example, addi), use symbol@got@ha(high adjusted) for the high part of the offset. If it has an unsigned immediate operand (for example, ori), use symbol@got@h. For a description of high-adjusted values, see .
Function Calls Direct function calls are made in programs with the Power Architecture bl instruction. A bl instruction can reach 32 MB backwards or forwards from the current position due to a self-relative branch displacement in the instruction. Therefore, the size of the text segment in an executable or shared object is constrained when a bl instruction is used to make a function call. When the distance of the called function exceeds the displacement reach of the bl instruction, a linker implementation may either introduce branch trampoline code to extend function call distances or issue a link error. As shown in , the bl instruction is generally used to call a local function. Two possibilities exist for the location of the function with respect to the caller: The called function is in the same executable or shared object as the caller. In this case, the symbol is resolved by the link editor and the bl instruction branches directly to the called function as shown in .
Direct Function Call
The called function is not in the same executable or shared object as the caller. In this case, the symbol cannot be directly resolved by the link editor. The link editor generates a branch to glue code that loads the address of the function from the Procedure Linkage Table. See . For indirect function calls, the address of the function to be called is placed in r12 and the CTR register. A bctrl instruction is used to perform the indirect branch as shown in , , and . The ELF V2 ABI requires the address of the called function to be in r12 when a cross-module function call is made.
Indirect Function Call (Absolute Medium Model)
shows how to make an indirect function call using small-model position-independent code.
Small-Model Position-Independent Indirect Function Call
shows how to make an indirect function call using large-model position-independent code.
Large-Model Position-Independent Indirect Function Call
shows how to make an indirect function call using PC-relative addressing in a function that does not preserve r2. [TBD: Formatting] PC-Relative Position-Independent Indirect Function Call C Code Assembly Code extern void function( ); extern void (*ptrfunc) ( ); ptrfunc=function; (*ptrfunc) ( ); .section .text pld r9,ptrfunc@got@pcrel pld r0,function@got@pcrel std r0,0(r9) pld r9, ptrfunc@got@pcrel ld r12,0(r9) mtctr r12 bctrl
Function calls often need to be performed in conjunction with establishing, maintaining, and restoring addressability through the TOC pointer register, r2. When a function is called, the TOC pointer register may be modified. In many cases, the caller must provide a nop after the bl instruction performing a call, if r2 is not known to have the same value in the callee. This is generally true for external calls. The linker will replace the nop with an r2 restoring instruction if the caller and callee use different r2 values. The linker leaves it unchanged if they use the same r2 value. This scheme avoids having a compiler generate an overconservative r2 save and restore around every external call. There are two cases where the caller should not provide a nop after the bl instruction performing a call: When the caller is not guaranteed to preserve r2 (see ); or When the callee is in the same compilation unit and is guaranteed to preserve r2. In the first case, the bl instruction must be marked with an R_PPC64_REL24_NOTOC relocation. See . For calls to functions resolved at runtime, the linker must generate stub code to load the function address from the PLT. The stub code also must save r2 to 24(r1) unless either the call is marked with an R_PPC64_REL24_NOTOC relocation as above, or the call is marked with an R_PPC64_TOCSAVE relocation that points to a nop provided in the caller's prologue. In either case, the stub code can omit the r2 save. In the latter case, the linker replaces the prologue nop with an r2 save. tocsaveloc: nop ... bl target .reloc ., R_PPC64_TOCSAVE, tocsaveloc nop The linker may assume that r2 is valid at the point of a call. Thus, stub code may use r2 to load an address from the PLT unless the call is marked with an R_PPC64_REL24_NOTOC relocation to indicate that r2 is not available. The nop instruction must be: ori r0,r0,0 For more information, see , , and .
Branching The flow of execution in a program is controlled by the use of branch instructions. Unconditional branch instructions can jump to locations up to 32 MB in either direction because they hold a signed value with a 64 MB range that is relative to the current location of the program execution. shows the model for branch instructions.
Branch Instruction Model
Selecting one of multiple branches is accomplished in C with switch statements. An address table is used by the compiler to implement the switch statement selections in cases where the case labels satisfy grouping constraints. In the examples that follow, details that are not relevant are avoided by the use of the following simplifying assumptions: r12 holds the selection expression. Case label constants begin at zero. The assembler names .Lcasei, .Ldefault, and .Ltab are used for the case labels, the default, and the address table respectively. For position-dependent code (for example, the main module of an application) loaded into the low or high address range, absolute addressing of a branch table yields the best performance.
Absolute Switch Code (Within) for static modules located in low or high 2 GB of address space
A faster variant of this code may be used to locate branch targets in the bottom 2 GB of the address space in conjunction with the lwz instruction in place of the lwa instruction.
Absolute Switch Code (Beyond) for static modules beyond the top or bottom 2 GB of the address space
For position-independent code targeted at being dynamically loaded to different address ranges as DSO, the preferred code pattern uses TOC-relative addressing by taking advantage of the fact that the TOC pointer points to a fixed offset from the code segment. The use of relative offsets from the start address of the branch table ensures position-independence when code is loaded at different addresses.
Position-Independent Switch Code for Small/Medium Models (preferred, with TOC-relative addressing)
For position-independent code targeted at being dynamically loaded to different address ranges as a DSO or a position-independent executable (PIE), the preferred code pattern uses TOC-indirect addresses for code models where the distance between the TOC and the branch table exceeds 2 GB. The use of relative offsets from the start address of the branch table ensures position independence when code is loaded at different addresses.
Position-Independent Switch Code for All Models (alternate, with GOT-indirect addressing)
shows how, in the medium code model, PIC code can be used to avoid using the lwa instruction, which may result in lower performance in some POWER processor implementations.
PIC Code that Avoids the lwa Instruction .text f1: addis r9,r2,.Ltab@ha sldi r10,r3,2 addi r9,r9,.Ltab@l lwzx r10,r10,r9 sub r10,r2,r10 mtctr r10 bctr .Ltab: .long .TOC. - Lcase0 .long .TOC. - Lcase1 .long .TOC. - Ldefault .long .TOC. - Lcase13
shows a switch implementation for PC-relative compilation units. [TBD: Formatting] Position-Independent Switch Code (PC-Relative Addressing) C Code Assembly Code switch(j) { case 0: ... case 1: ... case 3: ... default: ... } cmplwi r12, 4 bge .Ldefault slwi r12, 2 paddi r10, .Ltab@pcrel lwax r8, r10, r12 add r10, r8, r10 mtctr r10 bctr .p2align 2 .Ltab: .word (.Lcase0-.Ltab) .word (.Lcase1-.Ltab) .word (.Ldefault-.Ltab) .word (.Lcase3-.Ltab)
Dynamic Stack Space Allocation When allocated, a stack frame may be grown or shrunk dynamically as many times as necessary across the lifetime of a function. Standard calling conventions must be maintained because a subfunction can be called after the current frame is grown and that subfunction may stack, grow, shrink, and tear down a frame between dynamic stack frame allocations of the caller. The following constraints apply when dynamically growing or shrinking a stack frame: Maintain 16-byte alignment. Stack pointer adjustments shall be performed atomically so that at all times the value of the back-chain word is valid, when a back chain is used. Maintain addressability to the previously allocated local variables in the presence of multiple dynamic allocations or conditional allocations. Ensure that other linkage information is correct, so that the function can return or its stack space can be deallocated by exception handling without deallocating any dynamically allocated space. Using a frame pointer is the recognized method for maintaining addressability to arguments or local variables. (This may be a pointer to the top of the stack frame, typically in r31.) For correct behavior in the cases of setjmp( ) and longjmp( ), the frame pointer shall be allocated in a nonvolatile general-purpose register. shows the organization of a stack frame before a dynamic allocation.
Before Dynamic Stack Allocation
Example Code to Allocate n Bytes #define n 13 ; char *a = alloca(n); ; rnd(x) = round x to be multiple of stack alignment ; psave = size of parameter save area (may be zero). p = 32 + rnd(sizeof(psave)+15); Offset to the start of the dynamic allocation ld r0,0(r1) ; Load stdu r0,-rnd(n+15)(r1) ; Store new back chain, quadword-aligned. addi r3,r1,p ; R3 = new data area following parameter save area.
Because it is allowed (and common) to return without first deallocating this dynamically allocated memory, all the linkage information in the new location must be valid. Therefore, it is also necessary to copy the CR save word and the TOC pointer doubleword from their old locations to the new. It is not necessary to copy the LR save doubleword because, until this function makes a call, it does not contain a value that needs to be preserved. In the future, if it is defined and if the function uses the Reserved word, the LR save doubleword must also be copied. Additional instructions will be necessary for an allocation of variable size. If a dynamic deallocation will occur, the r1 stack pointer must be saved before the dynamic allocation, and r1 reset to that by the deallocation. The deallocation does not need to copy any stack locations because the old ones should still be valid. shows an example organization of a stack frame after a dynamic allocation.
After Dynamic Stack Allocation
DWARF Definition Although this ABI itself does not define a debugging format, debug with arbitrary record format (DWARF) is defined here for systems that implement the DWARF specification. For information about how to locate the specification, see . The DWARF specification is used by compilers and debuggers to aid source-level or symbolic debugging. However, the format is not biased toward any particular compiler or debugger. Per the DWARF specification, a mapping from Power Architecture registers to register numbers is required as described in . All instances of the Power Architecture use the mapping shown in for encoding registers into DWARF. DWARF register numbers 32–63 and 77–108 are also used to indicate the location of variables in VSX registers vsr0–vsr31 and vsr32–vsr63, respectively, in DWARF debug information. Mappings of Common Registers DWARF Register Number Register Name Register Width (Bytes) Reg 0–31 r0–r31 8 Reg 32–63 f0–f31 8 Reg 64 Reserved N/A Reg 65 lr 8 Reg 66 ctr 8 Reg 67 Reserved N/A Reg 68–75 cr0–cr7 0.5 The CRx registers correspond to 4-bit fields within a word where the offset of the 4-bit group within a word is a function of the CRFx number (x). Reg 76 xer 4 Reg 77–108 vr0–vr31 16 Reg 109 Reserved N/A Reg 110 vscr 8 Reg 111 Reserved N/A Reg 112 Reserved N/A Reg 113 Reserved N/A Reg 114 tfhar 8 Reg 115 tfiar 8 Reg 116 texasr 8
DWARF for the OpenPOWER ABI defines the address class codes described in . Address Class Codes Code Value Meaning ADDR_none 0 No class specified
Exception Handling Where exceptions can be thrown or caught by a function, or thrown through that function, or where a thread can be canceled from within a function, the locations where nonvolatile registers have been saved must be described with unwind information. The format of this information is based on the DWARF call frame information with extensions. Any implementation that generates unwind information must also provide exception handling functions that are the same as those described in the Itanium C++ ABI, the normative text on the issue. For information about how to locate this material, see . When unwinding, care must be taken to restore the TOC pointer r2 if and only if it has been saved. It is recommended that the unwinder reads the instruction at the return address in the link register and restores r2 if and only if that instruction is an explicit restore of r2, i.e., ld r2,24(r1).