Object Files
ELF Header The file class member of the ELF header identification array, e_ident[EI_CLASS], identifies the ELF file as 64-bit encoded by holding the value ELFCLASS64. For a big-endian encoded ELF file, the data encoding member of the ELF header identification array, e_ident[EI_DATA], holds the value 2, defined as data encoding ELFDATA2MSB. For a little-endian encoded ELF file, it holds the value 1, defined as data encoding ELFDATA2LSB. e_ident[EI_CLASS] ELFCLASS64 For all 64-bit implementations. e_ident[EI_DATA] ELFDATA2MSB For all big-endian implementations. e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. The ELF header's e_flags member holds bit flags associated with the file. The 64-bit PowerPC processor family defines the following flags. E_flags defining the ABI level: 0 For ELF object files of an unspecified nature. 1 For the Power ELF V1 ABI using function descriptors. This ABI is currently only used for big-endian PowerPC implementations. 2 For the OpenPOWER ELF V2 ABI using the facilities described here and including function pointers to directly reference functions. The ABI version to be used for the ELF header file is specified with the .abiversion pseudo-op: .abiversion 2 Processor identification resides in the ELF header's e_machine member, and must have the value EM_PPC64, defined as the value 21.
Special Sections lists the sections that are used in the Power Architecture to hold program and control information. It also shows their types and attributes. Special Sections Section Name Type Attributes .got SHT_PROGBITS SHF_ALLOC + SHF_WRITE .toc SHT_PROGBITS SHF_ALLOC + SHF_WRITE .plt The type of the OpenPOWER ABI .plt section is SHT_NOBITS, not SHT_PROGBITS as on most other processors. SHT_NOBITS SHF_ALLOC + SHF_WRITE .sdata SHT_PROGBITS SHF_ALLOC + SHF_WRITE .sbss SHT_NOBITS SHF_ALLOC + SHF_WRITE .data1 SHT_PROGBITS SHF_ALLOC + SHF_WRITE .bss1 SHT_NOBITS SHF_ALLOC + SHF_WRITE
Suggested uses of these special sections follow: The .got section may hold the Global Offset Table (GOT). This section is not normally present in a relocatable object file because it is linker generated. The linker must ensure that .got is aligned to an 8-byte boundary. In an executable or shared library, it may contain part or all of the TOC. For more information, see and . The .toc section may hold the initialized TOC. The .toc section must be aligned to an 8-byte boundary. Address elements within .toc must be aligned to 8-byte boundaries to support linker optimization of the .toc section. In a relocatable object file, .toc may contain addresses of objects and functions; in this respect it may be thought of as a compiler-managed GOT. It may also contain other constants or variables; in this respect it is like .sdata. In an executable or shared library, it may contain part or the entirety of the TOC. For more information, see , , and . The .plt section may hold the procedure linkage table. This section is not normally present in a relocatable object file because it is linker generated. Each entry within the .plt section is an 8-byte address. The linker must ensure that .plt is aligned to an 8-byte boundary. For more information, see . The .sdata section may hold initialized small-sized data. For more information, see . The .sbss section may hold uninitialized small-sized data. The .data1 section may hold initialized medium-sized data. The .bss1 section may hold uninitialized medium-sized data. Tools that support this ABI are not required to use these sections. However, if a tool uses these sections, it must assign the types and attributes specified in . Tools are not required to use the sections precisely as suggested. Relocation information and the code that refers to it define the actual use of a section.
TOC The TOC is part of the data segment of an executable program. This section describes a common layout of the TOC in an executable file or shared object. Particular tools are not required to follow the layout specified here. The TOC region commonly includes data items within the .got, .toc, .sdata, and .sbss sections. In the medium code model, they can be addressed with 32-bit signed offsets from the TOC pointer register. The TOC pointer register typically points to the beginning of the .got section + 0x8000, which permits a 2 GB TOC with the medium and large code models. The .got section is typically created by the link editor based on @got relocations. The .toc section is typically included from relocatable object files referenced during the link phase. The TOC may straddle the boundary between initialized and uninitialized data in the data segment. The common order of sections in the data segment, some of which may be empty, follows: .rodata .data .data1 .got .toc .sdata .sbss .plt .bss1 .bss The medium code model is expected to provide a sufficiently large TOC to provide all data addressing needs of a module with a single TOC. Compilers may generate two-instruction medium code model references (or, if selected, short displacement one-instruction references) for all data items that are in the TOC for the object file being compiled. Such references are relative to the TOC pointer register, r2. (The linker may optimize two-instruction forms to one instruction forms, replacing a first instruction of the two instruction form with a nop and rewriting the second instruction. Consequently, the TOC pointer must be live during the first and second instruction of a two-instruction reference.)  
Modules Containing Multiple TOCs The link editor may create multiple TOCs. In such a case, the constituent .got, .toc, .sdata, and .sbss sections are conceptually repeated as necessary, with each TOC typically using a TOC pointer value of its base plus 0x8000. Any constituent section of type SHT_NOBITS in any TOC but the last is converted to type SHT_PROGBITS filled with zeros. When multiple TOCs are present, linking must take care to save, initialize, and restore TOC pointers within a single module when calling from one function to a second function using a different TOC pointer value. Many of the same issues associated with a cross-module call apply also to calls within a module but using different TOC pointers.
Symbol Table
Symbol Values An executable file that contains a symbol reference that is to be resolved dynamically by an associated shared object will have a symbol table entry for that symbol. This entry will identify the symbol as undefined by setting the st_shndx member to SHN_UNDEF. The OpenPOWER ABI uses the three most-significant bits in the symbol st_other field to specify the number of instructions between a function's global entry point and local entry point. The global entry point is used when it is necessary to set up the TOC pointer (r2) for the function. The local entry point is used when r2 is known to already be valid for the function. A value of zero in these bits asserts that the function does not use r2. The values of these three most significant bits of the st_other field have the following meanings: 0 The local and global entry points are the same, and the function has a single entry point with no requirement on r12 or r2. On return, r2 will contain the same value as at entry. This value should be used for functions that do not require the use of a TOC register to access external data. In particular, functions that do not access data through the TOC pointer can use a common entry point for the local and global entry points. If the function is not a leaf function, it must call subroutines using the R_PPC64_REL24_NOTOC relocation to indicate that the TOC register is not initialized. In turn, this may lead to more expensive procedure linkage table (PLT) stub code than would be necessary if a TOC register were initialized. 1 The local and global entry points are the same, and r2 should be treated as caller-saved for local and global callers. 2 The local entry point is at one instruction past the global entry point. When called at the global entry point, r12 must be set to the function entry address. r2 will be set to the TOC base that this function needs, so it must be preserved and restored by the caller. When called at the local entry point, r12 is not used and r2 must already point to the TOC base that this function needs, and it will be preserved. 3 The local entry point is at two instructions past the global entry point. When called at the global entry point, r12 must be set to the function entry address. r2 will be set to the TOC base that this function needs, so it must be preserved and restored by the caller. When called at the local entry point, r12 is not used and r2 must already point to the TOC base that this function needs, and it will be preserved. 4 The local entry point is at four instructions past the global entry point. When called at the global entry point, r12 must be set to the function entry address. r2 will be set to the TOC base that this function needs, so it must be preserved and restored by the caller. When called at the local entry point, r12 is not used and r2 must already point to the TOC base that this function needs, and it will be preserved. 5 The local entry point is at eight instructions past the global entry point. When called at the global entry point, r12 must be set to the function entry address. r2 will be set to the TOC base that this function needs, so it must be preserved and restored by the caller. When called at the local entry point, r12 is not used and r2 must already point to the TOC base that this function needs, and it will be preserved. 6 The local entry point is at 16 instructions past the global entry point. When called at the global entry point, r12 must be set to the function entry address. r2 will be set to the TOC base that this function needs, so it must be preserved and restored by the caller. When called at the local entry point, r12 is not used and r2 must already point to the TOC base that this function needs, and it will be preserved. 7 Reserved The local-entry-point handling field of st_other is generated with the .localentry pseudo op. The following is an example using the medium code model: .globl my_func .type my_func, @function my_func: addis r2, r12, (.TOC.-my_func)@ha addi r2, r2, (.TOC.-my_func)@l .localentry my_func, .-my_func ... ; function definition blr Functions called via symbols with an st_other value of 0 may be called without a valid TOC pointer in r2. Symbols of functions that require a local entry with a valid TOC pointer should generate a symbol with an st_other field value of 2–6 and both local and global entry points, even if the global entry point will not be used. (In such a case, the instructions of the global entry setup sequence may optionally be initialized with TRAP instructions.) For very large programs, a 32-bit offset from the TOC base may not suffice to reach all function addresses. In this case, the large program model must be used, and the above sequence is replaced by: .globl my_func .type my_func, @function .quad .TOC.-my_func my_func: .reloc ., R_PPC64_ENTRY ; optional ld r2,-8(r12) add r2,r2,r12 .localentry my_func, .-my_func ... ; function definition blr The linker will resolve .TOC.-my_func to a 64-bit offset stored 8 bytes prior to the global entry point. The prologue code then forms the absolute address of the TOC base. Optionally, the linker may optimize the prologue sequence for functions that are within 2GB of the TOC base. To faciliate this, the compiler may associate an R_PPC64_ENTRY relocation with the global entry point. Note that this relocation simply provides a hint, and imposes no obligations on the linker to optimize the prologue sequence. Nor does the absence of this relocation forbid the linker from optimizing the prologue sequence.
Use of the Small Data Area For a data item in the .sdata or .sbss sections, a compiler may generate short-form one-instruction references. In an executable file or shared library, such a reference is relative to the address of the TOC base symbol (which can be obtained from r2 if a TOC pointer is initialized). A compiler that generates code using the small data area should provide an option to select the maximum size of objects placed in the small data area, and a means of disabling any use of the small data area. When generating code for ELF shared libraries, the small data area should not be used for default-visibility global objects. This is to satisfy ELF shared-library symbol interposition rules. That is, an ordinary global symbol in a shared library may be overridden by a symbol of the same name defined in the executable or another shared library. Supporting interposition when using TOC-pointer relative addressing would require text relocations.
Relocation Types The relocation entries in a relocatable file are used by the link editor to transform the contents of that file into an executable file or a shared object file. The application and result of a relocation are similar for both. Several relocatable files may be combined into one output file. The link editor merges the content of the files, sets the value of all function symbols, and performs relocations. The 64-bit OpenPOWER Architecture uses Elf64_Rela relocation entries exclusively. A relocation entry may operate upon a halfword, word, or doubleword. The r_offset member of the relocation entry designates the first byte of the address affected by the relocation. The subfield of r_offset affected by a relocation is implicit in the definition of the applied relocation type. The r_addend member of the relocation entry serves as the relocation addend, which is described in for each relocation type. A relocation type defines a set of instructions and calculations necessary to alter the subfield data of a particular relocation field.
Relocation Fields The following relocation fields identify a subfield of an address affected by a relocation. Bit numbers are shown at the bottom of the boxes. (Only big-endian bit numbers are shown for space considerations.) Byte numbers are shown in the top of the boxes; big-endian byte numbers are displayed in the upper left corners and little-endian in the upper right corners. The byte order specified in a relocatable file’s ELF header applies to all the elements of a relocation entry, the relocation field definitions, and relocation type calculations. In the following figure, doubleword64 specifies a 64-bit field occupying 8 bytes, the alignment of which is 8 bytes unless otherwise specified. 0 7 1 6 2 5 3 4 doubleword64 0 4 3 5 2 6 1 7 0 doubleword64 (continued) 63 In the following figure, word32 specifies a 32-bit field taking up 4 bytes and maintaining 4-byte alignment unless otherwise indicated. 0 3 1 2 2 1 3 0 word32 0 31 In the following figure, word30 specifies a 30-bit field taking up bits 0–29 of a word and maintaining 4-byte alignment unless otherwise indicated. 0 3 1 2 2 1 3 0 word30 0 29 30 31 In the following figure, low24 specifies a 24-bit field taking up bits 6–29 of a word and maintaining 4-byte alignment. The other bits remain unchanged. A call or unconditional branch instruction is an example of this field. 0 3 1 2 2 1 3 0 low24 0 5 6 29 30 31 In the following figure, low21 specifies a 21-bit field occupying the least-significant bits of a word with 4-byte alignment. 0 3 1 2 2 1 3 0 low21 0 10 11 31 In the following figure, low14 specifies a 14-bit field taking up bits 16–29 and possibly bit 10 (the branch prediction bit) of a word and maintaining 4-byte alignment. The other bits remain unchanged. A conditional branch instruction is an example usage. 0 3 1 2 2 1 3 0 low14 0 10 15 16 29 30 31 In the following figure, half16 specifies a 16-bit field taking up two bytes and maintaining 2-byte alignment. The immediate field of an Add Immediate instruction is an example of this field. 0 1 1 0 half16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 In the following figure, half16ds is similar to half16, but is really just 14 bits because the two least-significant bits must be zero and are not really part of the field. (Used by, for example, the ldu instruction.) In addition to the use of this relocation field with the DS forms, half16ds relocations are also used in conjunction with DQ forms. In those instances, the linker and assembler collaborate to create valid DQ forms. They raise an error if the specified offset does not meet the constraints of a valid DQ instruction form displacement. 0 1 1 0 half16ds 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 In the following figure, prefix34 specifies a 34-bit field split between bits 14-31 and 48-63 of a doubleword. The other bits remain unchanged. This is used by many PC-relative load and store instructions. prefix34 0 13 14 31 prefix34 (continued) 32 47 48 63 In the following figure, prefix34ds is similar to prefix34, but is really just 32 bits because the two least-significant bits must be zero and are not really part of the field. This is used, for example, by the pld instruction. prefix34ds 0 13 14 31 prefix34ds (continued) 32 47 48 61 62 63 In the following figure, prefix34dq is similar to prefix34, but is really just 31 bits because the three least-significant bits must be zero and are not really part of the field. This is used, for example, by the plxv instruction. prefix34dq 0 13 14 31 prefix34dq (continued) 32 47 48 60 61 63 In the following figure, prefix28dq specifies a 25-bit field split between bits 20-31 and 48-60 of a doubleword. The other bits remain unchanged, and the 25-bit field is assumed to be concatenated with three zero bits on the right to form a 28-bit offset. This is used, for example, by the pmlxv instruction. prefix28dq 0 19 20 31 prefix28dq (continued) 32 47 48 60 61 63
Relocation Notations The following notations are used in the relocation table. A Represents the addend used to compute the value of the relocatable field. B Represents the base address at which a shared object file has been loaded into memory during execution. Generally, a shared object file is built with a 0 base virtual address, but the execution address will be different. See Program Header in the System V ABI for more information about the base address. G Represents the offset from .TOC. at which the address of the relocation entry’s symbol resides during execution. This implies the creation of a .got section. For more information, see and . Reference in a calculation to the value G implicitly creates a GOT entry for the indicated symbol. L Represents the section offset or address of the procedure linkage table entry for the symbol. This implies the creation of a .plt section if one does not already exist. It also implies the creation of a procedure linkage table (PLT) entry for resolving the symbol. For an unresolved symbol, the PLT entry points to a PLT resolver stub. For a resolved symbol, a procedure linkage table entry holds the final effective address of a dynamically resolved symbol (see ). M Similar to G, except that the address that is stored may be the address of the procedure linkage table entry for the symbol. P Represents the place (section offset or address) of the storage unit being relocated (computed using r_offset). R Represents the offset of the symbol within the section in which the symbol is defined (its section-relative address). S Represents the value of the symbol whose index resides in the relocation entry. + Denotes 64-bit modulus addition. Denotes 64-bit modulus subtraction. >> Denotes arithmetic right-shifting. #lo(value) Denotes the least-significant 16 bits of the indicated value. That is: #lo(x) = (x & 0xffff). #hi(value) Denotes bits 16–31 of the indicated value. That is: #hi(x) = x >> 16 #ha(value) Denotes the high adjusted value: bits 16–31 of the indicated value, compensating for #lo( ) being treated as a signed number. That is: #ha(x) = (x + 0x8000) >> 16 #higher(value) Denotes bits 32–47 of the indicated value. That is: #higher(x) = x >> 32 #highera(value) Denotes the higher adjusted value: bits 32–47 of the indicated value, compensating for #hi( ) being treated as a signed number. That is: #highera(x) = (x + 0x80000000) >> 32 #highest(value) Denotes bits 48–63 of the indicated value. That is: #higher(x) = x >> 48 #highesta(value) Denotes the highest adjusted value: bits 48–63 of the indicated value, compensating for #higher( ) being treated as a signed number. That is: #highesta(x) = (x + 0x800000000000) >> 48 TP The value of the thread pointer in general-purpose register r13. TLS_TP_OFFSET The constant value 0x7000, representing the offset (in bytes) of the location that the thread pointer is initialized to point to, relative to the start of the thread local storage for the first initially available module. TCB_LENGTH The constant value 0x8, representing the length of the thread control block (TCB) in bytes. tcb Represents the base address of the TCB. tcb = (tp – (TLS_TP_OFFSET + TCB_LENGTH)) dtv Represents the base address of the dynamic thread vector (DTV). dtv = tcb[0] dtpmod Represents the load module index of the load module that contains the definition of the symbol being relocated and is used to index the DTV. dtprel Represents the offset of the symbol being relocated relative to the value of dtv[dtpmod]. dtv[dtpmod] + dtprel = (S + A) tprel Represents the offset of the symbol being relocated relative to the TP. tp + tprel = (S + A) pcrel Represents the offset of the symbol being relocated relative to the current instruction address. tlsgd Allocates two contiguous entries in the GOT to hold a tls_index structure, with values dtpmod and dtprel, and computes the offset from .TOC. of the first entry. If n is the offset computed: GOT[n] = dtpmod GOT[n + 1] = dtprel The call to __tls_get_addr ( ) happens as: __tls_get_addr ((tls_index *) &GOT[n]) tlsld Allocates two contiguous entries in the GOT to hold a tls_index structure, with values dtpmod and zero, and computes the offset from .TOC. of the first entry. If n is the offset computed: GOT[n] = dtpmod GOT[n + 1] = 0 The call to __tls_get_addr ( ) happens as: __tls_get_addr ((tls_index *) &GOT[n]) tprelg Allocates an entry in the GOT with value tprel, and computes the offset from .TOC. of the entry. If n is the offset computed: GOT[n] = tprel The value of tprel is loaded into a register from the location (GOT + n) to be used in an r2 form instruction. Relocations flagged with an asterisk(*) will trigger a relocation failure if the value computed does not fit in the field specified.
Relocation Types Table The following rules apply to the relocation types defined in : For relocation types in which the names contain 14 or 16, the upper 49 bits of the value computed before shifting must all be the same. For relocation types in which the names contain 24, the upper 39 bits of the value computed before shifting must all be the same. For relocation types in which the names contain 14 or 24, the low 2 bits of the value computed before shifting must all be zero. The relocation types whose Field column entry contains an asterisk (*) are subject to failure if the value computed does not fit in the allocated bits. Relocations that refer to half16ds (56–66, 87–88, 91–92, 95–96, and 101–102) are to be used to direct the linker to look at the underlying instruction and treat the field as a DS or DQ field. ABI-compliant tools should give an error for attempts to relocate an address to a value that is not divisible by 4. Relocation Table Relocation Name Value Field Expression Relocation values 8, 9, 12, 13, 18, 23, 32, and 247 are not used. This is to maintain a correspondence to the relocation values used by the 32-bit PowerPC ELF ABI. R_PPC64_NONE 0 none none R_PPC64_ADDR32 1 word32* S + A R_PPC64_ADDR24 2 low24* (S + A) >> 2 R_PPC64_ADDR16 3 half16* S + A R_PPC64_ADDR16_LO 4 half16 #lo(S + A) R_PPC64_ADDR16_HI 5 half16* #hi(S + A) R_PPC64_ADDR16_HA 6 half16* #ha(S + A) R_PPC64_ADDR14 7 low14* (S + A) >> 2 R_PPC64_REL24 10 low24* (S + A – P) >> 2 R_PPC64_REL14 11 low14* (S + A – P) >> 2 R_PPC64_GOT16 14 half16* G R_PPC64_GOT16_LO 15 half16 #lo(G) R_PPC64_GOT16_HI 16 half16* #hi(G) R_PPC64_GOT16_HA 17 half16* #ha(G) R_PPC64_COPY 19 varies See . R_PPC64_GLOB_DAT 20 doubleword64 S + A R_PPC64_JMP_SLOT 21 doubleword64 See . R_PPC64_RELATIVE 22 doubleword64 B + A R_PPC64_UADDR32 24 word32* S + A R_PPC64_UADDR16 25 half16* S + A R_PPC64_REL32 26 word32* S + A – P R_PPC64_PLT32 27 word32* L R_PPC64_PLTREL32 28 word32* L – P R_PPC64_PLT16_LO 29 half16 #lo(L) R_PPC64_PLT16_HI 30 half16* #hi(L) R_PPC64_PLT16_HA 31 half16* #ha(L) R_PPC64_SECTOFF 33 half16* R + A R_PPC64_SECTOFF_LO 34 half16 #lo(R + A) R_PPC64_SECTOFF_HI 35 half16* #hi(R + A) R_PPC64_SECTOFF_HA 36 half16* #ha(R + A) R_PPC64_REL30 37 word30 (S + A – P) >> 2 R_PPC64_ADDR64 38 doubleword64 S + A R_PPC64_ADDR16_HIGHER 39 half16 #higher(S + A) R_PPC64_ADDR16_HIGHERA 40 half16 #highera(S + A) R_PPC64_ADDR16_HIGHEST 41 half16 #highest(S + A) R_PPC64_ADDR16_HIGHESTA 42 half16 #highesta(S + A) R_PPC64_UADDR64 43 doubleword64 S + A R_PPC64_REL64 44 doubleword64 S + A – P R_PPC64_PLT64 45 doubleword64 L R_PPC64_PLTREL64 46 doubleword64 L – P R_PPC64_TOC16 47 half16* S + A – .TOC. R_PPC64_TOC16_LO 48 half16 #lo(S + A – .TOC.) R_PPC64_TOC16_HI 49 half16* #hi(S + A – .TOC.) R_PPC64_TOC16_HA 50 half16* #ha(S + A – .TOC.) R_PPC64_TOC 51 doubleword64 .TOC. R_PPC64_PLTGOT16 52 half16* M R_PPC64_PLTGOT16_LO 53 half16 #lo(M) R_PPC64_PLTGOT16_HI 54 half16* #hi(M) R_PPC64_PLTGOT16_HA 55 half16* #ha(M) R_PPC64_ADDR16_DS 56 half16ds* (S + A) >> 2 R_PPC64_ADDR16_LO_DS 57 half16ds #lo(S + A) >> 2 R_PPC64_GOT16_DS 58 half16ds* G >> 2 R_PPC64_GOT16_LO_DS 59 half16ds #lo(G) >> 2 R_PPC64_PLT16_LO_DS 60 half16ds #lo(L) >> 2 R_PPC64_SECTOFF_DS 61 half16ds* (R + A) >> 2 R_PPC64_SECTOFF_LO_DS 62 half16ds #lo(R + A) >> 2 R_PPC64_TOC16_DS 63 half16ds* (S + A – .TOC.) >> 2 R_PPC64_TOC16_LO_DS 64 half16ds #lo(S + A – .TOC.) >> 2 R_PPC64_PLTGOT16_DS 65 half16ds* M >> 2 R_PPC64_PLTGOT16_LO_DS 66 half16ds #lo(M) >> 2 R_PPC64_TLS 67 none none R_PPC64_DTPMOD64 68 doubleword64 @dtpmod R_PPC64_TPREL16 69 half16* @tprel R_PPC64_TPREL16_LO 70 half16 #lo(@tprel) R_PPC64_TPREL16_HI 71 half16* #hi(@tprel) R_PPC64_TPREL16_HA 72 half16* #ha(@tprel) R_PPC64_TPREL64 73 doubleword64 @tprel R_PPC64_DTPREL16 74 half16* @dtprel R_PPC64_DTPREL16_LO 75 half16 #lo(@dtprel) R_PPC64_DTPREL16_HI 76 half16* #hi(@dtprel) R_PPC64_DTPREL16_HA 77 half16* #ha(@dtprel) R_PPC64_DTPREL64 78 doubleword64 @dtprel R_PPC64_GOT_TLSGD16 79 half16* @got@tlsgd R_PPC64_GOT_TLSGD16_LO 80 half16 #lo(@got@tlsgd) R_PPC64_GOT_TLSGD16_HI 81 half16* #hi(@got@tlsgd) R_PPC64_GOT_TLSGD16_HA 82 half16* #ha(@got@tlsgd) R_PPC64_GOT_TLSLD16 83 half16* @got@tlsld R_PPC64_GOT_TLSLD16_LO 84 half16 #lo(@got@tlsld) R_PPC64_GOT_TLSLD16_HI 85 half16* #hi(@got@tlsld) R_PPC64_GOT_TLSLD16_HA 86 half16* #ha(@got@tlsld) R_PPC64_GOT_TPREL16_DS 87 half16ds* @got@tprel R_PPC64_GOT_TPREL16_LO_DS 88 half16ds #lo(@got@tprel) R_PPC64_GOT_TPREL16_HI 89 half16* #hi(@got@tprel) R_PPC64_GOT_TPREL16_HA 90 half16* #ha(@got@tprel) R_PPC64_GOT_DTPREL16_DS 91 half16ds* @got@dtprel R_PPC64_GOT_DTPREL16_LO_DS 92 half16ds #lo(@got@dtprel) R_PPC64_GOT_DTPREL16_HI 93 half16* #hi(@got@dtprel) R_PPC64_GOT_DTPREL16_HA 94 half16* #ha(@got@dtprel) R_PPC64_TPREL16_DS 95 half16ds* @tprel R_PPC64_TPREL16_LO_DS 96 half16ds #lo(@tprel) R_PPC64_TPREL16_HIGHER 97 half16 #higher(@tprel) R_PPC64_TPREL16_HIGHERA 98 half16 #highera(@tprel) R_PPC64_TPREL16_HIGHEST 99 half16 #highest(@tprel) R_PPC64_TPREL16_HIGHESTA 100 half16 #highesta(@tprel) R_PPC64_DTPREL16_DS 101 half16ds* @dtprel R_PPC64_DTPREL16_LO_DS 102 half16ds #lo(@dtprel) R_PPC64_DTPREL16_HIGHER 103 half16 #higher(@dtprel) R_PPC64_DTPREL16_HIGHERA 104 half16 #highera(@dtprel) R_PPC64_DTPREL16_HIGHEST 105 half16 #highest(@dtprel) R_PPC64_DTPREL16_HIGHESTA 106 half16 #highesta(@dtprel) R_PPC64_TLSGD 107 none none R_PPC64_TLSLD 108 none none R_PPC64_TOCSAVE 109 none none R_PPC64_ADDR16_HIGH 110 half16 #hi(S + A) R_PPC64_ADDR16_HIGHA 111 half16 #ha(S + A) R_PPC64_TPREL16_HIGH 112 half16 #hi(@tprel) R_PPC64_TPREL16_HIGHA 113 half16 #ha(@tprel) R_PPC64_DTPREL16_HIGH 114 half16 #hi(@dtprel) R_PPC64_DTPREL16_HIGHA 115 half16 #ha(@dtprel) R_PPC64_REL24_NOTOC 116 low24* (S + A – P) >> 2 R_PPC64_ADDR64_LOCAL 117 doubleword64 S + A (See .) R_PPC64_ENTRY 118 none none R_PPC64_IRELATIVE 248 doubleword64 See . R_PPC64_REL16 249 half16* S + A – P R_PPC64_REL16_LO 250 half16 #lo(S + A – P) R_PPC64_REL16_HI 251 half16* #hi(S + A – P) R_PPC64_REL16_HA 252 half16* #ha(S + A – P) R_PPC64_GNU_VTINHERIT 253 R_PPC64_GNU_VTENTRY 254 R_PPC64_PCREL34 256 prefix34* @pcrel R_PPC64_PCREL34_DS 257 prefix34ds* @pcrel >> 2 R_PPC64_PCREL34_DQ 258 prefix34dq* @pcrel >> 3 R_PPC64_PCREL28_DQ 259 prefix28dq* @pcrel >> 3 R_PPC64_GOT_PCREL34 260 prefix34* @got@pcrel R_PPC64_GOT_PCREL34_DS 261 prefix34ds* @got@pcrel >> 2 R_PPC64_GOT_PCREL34_DQ 262 prefix34dq* @got@pcrel >> 3 R_PPC64_GOT_PCREL28_DQ 263 prefix28dq* @got@pcrel >> 3 R_PPC64_PCREL_OPT 264
Relocation Descriptions The following list describes relocations that can require special handling or description. R_PPC64_GOT16* These relocation types are similar to the corresponding R_PPC64_ADDR16* types. However, they refer to the address of the symbol’s GOT entry and instruct the link editor to build a GOT. R_PPC64_PLTGOT16* These relocation types are similar to the corresponding R_PPC64_GOT16* types. However, if the link editor cannot determine the actual value of the symbol, the GOT entry may contain the address of an entry in the procedure linkage table. The link editor creates that entry in the procedure linkage table and stores that address in the GOT entry. This permits lazy resolution of function symbols at run time. If the link editor can determine the value of the symbol, it stores that value in the corresponding GOT entry. The link editor may generate an R_PPC64_GLOB_DAT relocation as usual. R_PPC64_PLTREL32, R_PPC64_PLTREL64 These relocations indicate that reference to a symbol should be resolved through a call to the symbol’s procedure linkage table entry. Additionally, it instructs the link editor to build a procedure linkage table for the executable or shared object if one is not created. R_PPC64_COPY This relocation type is created by the link editor for dynamic linking. Its offset member refers to a location in a writable segment. The symbol table index specifies a symbol that should exist both in the current relocatable file and in a shared object file. During execution, the dynamic linker copies data associated with the shared object’s symbol to the location specified by the offset. R_PPC64_GLOB_DAT This relocation type allows determination of the correspondence between symbols and GOT entries. It is similar to R_PPC64_ADDR64. However, it sets a GOT entry to the address of the specified symbol. R_PPC64_JMP_SLOT This relocation type is created by the link editor for dynamic linking. Its offset member gives the location of a procedure linkage table (PLT) entry. The dynamic linker modifies the PLT entry to transfer control to the designated symbol’s address (see ). R_PPC64_RELATIVE This relocation type is created by the link editor for dynamic linking. Its offset member gives a location within a shared object that contains a value representing a relative address. The corresponding virtual address is computed by the dynamic linker. It adds the virtual address at which the shared object was loaded to the relative address. Relocation entries for this type must specify 0 for the symbol table index. R_PPC64_IRELATIVE The link editor creates this relocation type for dynamic linking. Its addend member specifies the global entry-point location of a resolver function returning a function pointer. It is used to implement the STT_GNU_IFUNC framework. The resolver is called, and the returned pointer copied into the location specified by the relocation offset member. R_PPC64_TLS, R_PPC64_TLSGD, R_PPC64_TLSLD Used as markers on thread local storage (TLS) code sequences, these relocations tie the entire sequence with a particular TLS symbol. For more information, see . R_PPC64_TOCSAVE This relocation type indicates a position where a TOC save may be inserted in the function to avoid a TOC save as part of the PLT stub code. A nop can be emitted by a compiler in a function's prologue code. A link editor can change it to a TOC pointer save instruction. This marker relocation is placed on the prologue nop and on nops after bl instructions, with the symbol plus addend pointing to the prologue nop. If the link editor uses the prologue to save r2, it may omit r2 saves in the PLT call stub code emitted for calls marked by R_PPC64_TOCSAVE. R_PPC64_UADDR* These relocation types are the same as the corresponding R_PPC64_ADDR* types, except that the datum to be relocated is allowed to be unaligned. R_PPC64_ADDR64_LOCAL When a separate local entry point exists, this relocation type is used to initialize a memory location with the address of that local entry point. R_PPC64_REL24_NOTOC This relocation type is used to specify a function call where the TOC pointer is not initialized. It is similar to R_PPC64_REL24 in that it specifies a symbol to be resolved. If the symbol resolves to a function that requires a TOC pointer (as determined by st_other bits) then a link editor must arrange for the call to be via the global entry point of the called function. Any However, if the symbol is resolved by inserting a call to a PLT stub code, the PLT stub code must not rely on the presence of a valid TOC base address in TOC register r2 to reference the PLT function table. R_PPC64_ENTRY This relocation type may optionally be associated with a global entry point. See for discussion of its use. R_PPC64_PCREL_OPT This relocation type requests that the annotated instruction and its immediately following instruction be optimized by the linker when the referenced symbol can be statically resolved, or when a more efficient PC-relative sequence can be chosen. See and for details.
Assembler Syntax The offset from .TOC. in the GOT where the value of the symbol is stored is given by the assembly syntax symbol@got. The value of the symbol alone is the address of the variable named symbol. For example: addis r3, r2,x@got@ha ld r3,x@got@l(r3) Although the Power ISA only defines 16-bit displacements, many TOCs (and hence a GOT) are larger then 64 KB but fit within 2 GB, which can be addressed with 32-bit offsets from r2. Therefore, this ABI defines a simple syntax for 32-bit offsets to the GOT. The syntaxes SYMBOL@got@ha, SYMBOL@got@h, and SYMBOL@got@l refer to the high adjusted, high, and low parts of the GOT offset. (For an explanation of the meaning of “high adjusted,” see ). SYMBOL@got@ha corresponds to bits 32–63 of the offset within the global offset table with adjustment for the sign extension of the low-order offset bits. SYMBOL@got@l corresponds to the 16 low-order bits of the offset within the global offset table. The syntax SYMBOL@toc refers to the value (SYMBOL – .TOC.), where .TOC. represents the TOC base for the current object file. This provides the address of the variable whose name is SYMBOL as an offset from the TOC base. As with the GOT, the syntaxes SYMBOL@toc@ha, SYMBOL@toc@h, and SYMBOL@toc@l refer to the high adjusted, high, and low parts of the TOC offset. The syntax SYMBOL@got@plt may be used to refer to the offset in the TOC of a procedure linkage table entry stored in the global offset table. The corresponding syntaxes SYMBOL@got@plt@ha, SYMBOL@got@plt@h, and SYMBOL@got@plt@l are also defined. If X is a variable stored in the TOC, then X@got is the offset within the TOC of a doubleword whose value is X@toc. The special symbol .TOC. is used to represent the TOC base for the current object file. The following code might appear in a PIC code setup sequence to compute the distance from a function entry point to the TOC base: addis 2,12,.TOC.-func@ha addi 2,2,.TOC.-func@l The syntax SYMBOL@localentry refers to the value of the local entry point associated with a function symbol. It can be used to initialize a memory word with the address of the local entry point as follows: .quad func@localentry
Assembler- and Linker-Mediated Executable Optimization To optimize object code, the assembler and linker may rewrite object code to implement the function call and return conventions and access to global and thread-local data. It is the responsibility of compilers and programmers to generate assembly programs and objects that conform to the requirements as indicated in this section.
Function Call Unless the bl instruction is annotated with an R_PPC64_REL24_NOTOC relocation, the static linker must modify a nop instruction after a bl function call to restore the TOC pointer in r2 from 24(r1) when an external symbol that may use the TOC may be called, as in . A function must contain a nop slot after a bl instruction to an external symbol.
Reference Optimization References to the GOT may be optimized by rewriting indirect reference code to replace the reference by an address computation. This transformation is only performed by the linker when the symbol is known to be local to the module.
Displacement Optimization for TOC Pointer Relative Accesses Assemblers and linkers may optimize TOC reference code that consists of two instructions with equivalent code when offset@ha is 0. TOC reference code: addis rt, r2, offset@ha lwz rt, offset@l(rt) Equivalent code: NOP lwz rt, offset(r2) Compilers and programmers must ensure that r2 is live at the actual data access point associated with extended displacement addressing.
TOC Pointer Usage To enable linker-based optimizations when global data is accessed, the TOC pointer needs to be available for dereference at the point of all uses of values derived from the TOC pointer in conjunction with the @l operator. This property is used by the linker to optimize TOC pointer accesses. In addition, all reaching definitions for a TOC-pointer-derived access must compute the same definition. In some implementations, non-ABI-compliant code may be processed by providing additional linker options; for example, linker options disabling linker optimization. However, this behavior in support of non-ABI-compliant code is not guaranteed to be portable and supported in all systems.   Compliant example addis r4, r2, mysym@toc@ha b target ... addis r4, r2, mysym@toc@ha target: addi r4, r4, mysym@toc@l ...   Non-compliant example li r4, 0 ; #d1 b target ... addis r4, r2, mysym@toc@ha ; #d2 target: addi r4, r4, mysym@toc@l ; incompatible definitions #d1 and #d2 reach this ...
Table Jump Sequences Some linkers may rewrite jump table sequences, as described in . For example, linkers may rewrite address references created using GOT-indirect loads and bl+4 sequences to use TOC-relative address computation.
Displacement Optimization for PC-Relative Accesses Compilers and assembly programmers must assume that references to extern data having unrestricted visibility may be satisfied by a dynamically linked object, and must therefore use PC-relative GOT-indirect addressing for such references. A linker may determine that such a reference is satisfied during static linking and replace the reference with direct PC-relative addressing. For example: pld r12, symbol@got@pcrel lvx v1, 0, r12 The previous sequence may be replaced by: plxv v1, symbol@pcrel nop However, this optimization is not universally safe, since it changes the value of r12 following the data reference. The compiler or programmer must ensure that the value of r12 is not subsequently used, and communicate a request for this optimization by placing an R_PPC64_PCREL_OPT relocation on the first instruction in the sequence. The compiler or programmer must further ensure that the two instructions are not separated by intervening instructions.
Optimization of Masked Load/Store Sequences PC-relative forms of the pmlxv and pmstxv instructions have a 28-bit offset, which is too small to guarantee that the offset will not overflow when relocated within a medium code model binary. Compilers should not directly generate PC-relative forms of these instructions, but may instead generate a short sequence that can be optimized by a linker. For example: paddi r12,symbol@pcrel pmlxvx v1,r10,r12,VRM,MC,P,0 The previous sequence may be replaced by: dnop pmlxv v1,symbol@pcrel(r10),VRM,MC,P,1 when the linker determines that the offset from the current instruction address to symbol's address will fit in 28 bits. Again, this optimization is not universally safe, since it changes the value of r12 following the data reference. The compiler or programmer must ensure that the value of r12 is not subsequently used, and communicate a request for this optimization by placing an R_PPC64_PCREL_OPT relocation on the first instruction in the sequence. The compiler or programmer must further ensure that the two instructions are not separated by intervening instructions.
Fusion Code generation in compilers, linkers, and by programmers should use a destructive sequence of two sequential instructions consisting of first an addis followed by a second instruction using a D form instruction to create or load from a 32-bit offset from a register to enable hardware fusion whenever possible: addis r4, r3, upper <lbz,lhz,lwz,ld> r4, lower(r4) addis r4, r3, upper addi r4, r4, lower It is encouraged that assemblers provide pseudo-ops to facilitate such code generation with a single assembler mnemonic.
Thread-Local Linker Optimizations Additional code rewriting is performed by the linker in conjunction with the use of thread-local storage described in .
Thread Local Storage ABI The ELF Handling for Thread-Local Storage document is the authoritative TLS ABI specification that defines the context in which information in the TLS section of this Power Architecture 64-bit ELF V2 ABI must be viewed. For information about how to access this document, see . To maintain congruence with that document, in this section the term module refers to an executable or shared object since both are treated similarly.
TLS Background Most C/C++ implementations support (as an extension to earlier versions of the language) the keyword __thread to be used as a storage-class specifier in variable declarations and definitions of data objects with thread storage duration. (The 2011 ISO C Standard uses _Thread_local as the keyword, while the 2011 ISO C++ Standard uses thread_local.) A variable declared in this manner is automatically allocated local to each thread. Its lifetime is defined to be the entire execution of the thread. Any initialization value is assigned once before thread startup.
TLS Runtime Handling A thread-local variable is completely identified by the module in which it is defined, along with the offset of the variable relative to the start of the TLS block for the module. A module is referenced by its index (an integer starting with 1, which is assigned by the run-time environment) into the dynamic thread vector (DTV). The offset of the variable is kept in the st_value field of the TLS variable’s symbol table entry. The TLS data structures follow variant I of the ELF TLS ABI. For the 64-bit PowerPC Architecture, the specific organization of the data structures is as follows. The thread control block (TCB) consists of the DTV, which is an 8-byte pointer. An extended TCB may have additional implementation-specific fields; these fields are located before the DTV pointer because the addresses are computed as negative offsets from the TCB address. The fields must never be rearranged for any reason. The current glibc extended TCB is: typedef struct { /* Reservation for HWCAP data. */ unsigned int hwcap2; unsigned int hwcap; /* not used in LE ABI */ /* Indicate if HTM capable (ISA 2.07). */ int tm_capable; int tm_pad; /* Reservation for dynamic system optimizer ABI. */ uintptr_t dso_slot2; uintptr_t dso_slot1; /* Reservation for tar register (ISA 2.07). */ uintptr_t tar_save; /* GCC split stack support. */ void *__private_ss; /* Reservation for the event-based branching ABI. */ uintptr_t ebb_handler; uintptr_t ebb_ctx_pointer; uintptr_t ebb_reserved1; uintptr_t ebb_reserved2; uintptr_t pointer_guard; /* Reservation for stack guard */ uintptr_t stack_guard; /* DTV pointer */ dtv_t *dtv; } tcbhead_t; Modules that will not be unloaded will be present at startup time; the TLS blocks for these are created consecutively and immediately follow the TCB. The offset of the TLS block of an initially available module from the TCB remains fixed after program start. The tlsoffset(m) values for a module with index m, where m ranges from 1–M, M being the total number of modules, are computed as follows: tlsoffset(1) = round(16, align(1)) tlsoffset(m + 1) = round(tlsoffset(m) + tlssize(m), align(m + 1)) The function round( ) returns its first argument rounded up to the next multiple of its second argument: round(x, y) = y × ceiling(x / y) The function ceiling( ) returns the smallest integer greater than or equal to its argument, where n is an integer satisfying: n – 1 < x ≤ n: ceiling(x) = n In the case of dynamic shared objects (DSO), TLS blocks are allocated on an as-needed basis, with the details of allocation abstracted away by the __tls_get_addr( ) function, which is used to retrieve the address of any TLS variable. The prototype for the __tls_get_addr( ) function, is defined as follows. typedef struct { unsigned long int ti_module; unsigned long int ti_offset; } tls_index; extern void *__tls_get_addr (tls_index *ti); The thread pointer (TP) is held in r13 and is used to access the TCB. The TP is initialized to point 0x7000 bytes past the end of the TCB. The TP offset allows for efficient addressing of the TCB and up to 4 KB – 8 B of other thread library information (placed before the TCB). shows the region of memory before and after the TCB that can be efficiently addressed by the TP.
Thread Pointer Addressable Memory
Each DTV pointer points 0x8000 bytes past the start of each TLS block. (For implementation reasons, the actual value stored in the DTV may point to the start of a TLS block. However, values returned by accessor functions will be offset by 0x8000 bytes.) This offset allows the first 64 KB of each block to be addressed from a DTV pointer using fewer machine instructions.
TLS Block Diagram
TLS[m] denotes the TLS block for the module with index m. DTV[m] denotes the DTV pointer for the module with index m.
TLS Access Models TLS data access is categorized into the following models: General Dynamic TLS Model Local Dynamic TLS Model Initial Exec TLS Model Local Exec TLS Model Examples for each access model are provided in the following TLS Model subsections.
General Dynamic TLS Model This specification provides examples based on the medium code model, which is the default for the ELF V2 ABI. Given the following code fragment, to determine the address of a thread-local variable x, the __tls_get_addr( ) function is called with one parameter. That parameter is a pointer to a data object of type tls_index. extern __thread unsigned int x; &x; General Dynamic Initial Relocations Code Sequence Relocation Symbol addis r3, r2, x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x addi r3, r3, x@got@tlsgd@l R_PPC64_GOT_TLSGD16_LO x bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x R_PPC64_REL24 __tls_get_addr nop
General Dynamic GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_DTPMOD64 x GOT[n+1] R_PPC64_DTPREL64 x
The relocation specifier @got@tlsgd causes the link editor to create a data object of type tls_index in the GOT. The address of this data object is loaded into the first argument register with the addis and addi instruction, and a standard function call is made. Notice that the bl instruction has two relocations: the R_PPC64_TLSGD tying it to the argument setup instructions and the R_PPC64_REL24 specifying the call destination.
Local Dynamic TLS Model For the Local Dynamic TLS Model, three different relocation sequences may be used, depending on the size of the thread storage block offset to the variable. For the following code sequence, a different relocation sequence is used for each variable. static __thread unsigned int x1; static __thread unsigned int x2; static __thread unsigned int x3; &x1; &x2; &x3; Local Dynamic Initial Relocations Code Sequence Relocation Symbol addis r3, r2, x1@got@tlsld@ha R_PPC64_GOT_TLSLD16_HA x1 addi r3, r3, x1@got@tlsld@l R_PPC64_GOT_TLSLD16_LO x1 bl __tls_get_addr(x1@tlsld) R_PPC64_TLSLD x1 R_PPC64_REL24 __tls_get_addr nop ... addi r9, r3, x1@dtprel R_PPC64_DTPREL16 x1 ... addis r9, r3, x2@dtprel@ha R_PPC64_DTPREL16_HA x2 addi r9, r9, x2@dtprel@l R_PPC64_DTPREL16_LO x2 ... addis r9, r2, x3@got@dtprel@ha R_PPC64_GOT_DTPREL16_HA x3 ld r9, x3@got@dtprel@l(r9) R_PPC64_GOT_DTPREL16_LO_DS x3 add r9, r9, r3
Local Dynamic GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_DTPMOD64 x1 GOT[n+1] 0 GOT[m] R_PPC64_DTPREL64 x3
The relocation specifier @got@tlsld in the first instruction causes the link editor to generate a tls_index data object in the GOT with a fixed 0 offset. The following code assumes that x1 is in the first 64 KB of the thread storage block. The x2 symbol is not within the first 64 KB but is within the first 2 GB, and x3 is outside the 2 GB area. To load the values of x1, x2, and x3 instead of their addresses, replace the latter part of with the following code sequence. Local Dynamic Relocations with Values Loaded Code Sequence Relocation Symbol ... lwz r0, x1@dtprel(r3) R_PPC64_DTPREL16 x1 ... addis r9, r3, x2@dtprel@ha R_PPC64_DTPREL16_HA x2 lwz r0, x2@dtprel@l(r9) R_PPC64_DTPREL16_LO x2 ... addis r9, r2, x3@got@dtprel@ha R_PPC64_GOT_DTPREL16_HA x3 ld r9, x3@got@dtprel@l(r9) R_PPC64_GOT_DTPREL16_LO_DS x3 lwzx r0, r3, r9
Initial Exec TLS Model Given the following code fragment, the relocation sequence in is used for the Initial Exec TLS Model: extern __thread unsigned int x; &x; Initial Exec Initial Relocations Code Sequence Relocation Symbol addis r9, r2, x@got@tprel@ha R_PPC64_GOT_TPREL16_HA x ld r9, x@got@tprel@l(r9) R_PPC64_GOT_TPREL16_LO_DS x add r9, r9, x@tls R_PPC64_TLS x
Initial Exec GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_TPREL64 x
The relocation specifier @got@tprel in the first instruction causes the link editor to generate a GOT entry with a relocation that the dynamic linker will replace with the offset for x relative to the thread pointer. The relocation specifier x@tls tells the assembler to use an r13 form of the instruction. That is, add r9,r9,r13 in this case, and tag the instruction with a relocation that indicates it belongs to a TLS sequence. This relocation specifier can be used later by the link editor when optimizing TLS code. To read the contents of the variable instead of calculating its address, the add r9, r9, x@tls instruction might be replaced with lwzx r0, r9, x@tls.
Local Exec TLS Model Given the following code fragment, three different relocation sequences may be used, depending on the size of the offset to the variable. The sequence in handles offsets within 60 KB relative to the end of the TCB (where r13 points 28 KB past the end of the TCB, which is immediately before the first TLS block). The sequence in handles offsets past 60 KB and less than 2 GB + 28 KB relative to the end of the TCB. The third sequence is identical to the Initial Exec sequence shown in . static __thread unsigned int x; &x; illustrates which sequence is used.
Local Exec TLS Model Sequences
Local Exec Initial Relocations (Sequence 1) Code Sequence Relocation Symbol addi r9, r13, x1@tprel R_PPC_TPREL16 x
Local Exec Initial Relocations (Sequence 2) Code Sequence Relocation Symbol addis r9, r13, x2@tprel@ha R_PPC64_TPREL16_HA x addi r9, r9, x2@tprel@l R_PPC64_TPREL16_LO x
TLS Link Editor Optimizations In some cases, the link editor may be able to optimize TLS code sequences, provided the compiler emits code sequences as described. The following TLS link editor transformations are provided as optimizations to convert between specific TLS access models: General Dynamic to Initial Exec General Dynamic to Local Exec Local Dynamic to Local Exec Initial Exec to Local Exec
General Dynamic to Initial Exec General-Dynamic-to-Initial-Exec Initial Relocations Code Sequence Relocation Symbol addis r3, r2, x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x addi r3, r3, x@got@tlsgd@l R_PPC64_GOT_TLSGD16_LO x bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x R_PPC64_REL24 __tls_get_addr nop
General-Dynamic-to-Initial-Exec GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_DTPMOD64 x GOT[n+1] R_PPC64_DTPREL64 x
The preceding code and global offset table entries are replaced by the following code and global offset table entries. General-Dynamic-to-Initial-Exec Replacement Initial Relocations Code Sequence Relocation Symbol addis r3, r2, x@got@tprel@ha R_PPC64_GOT_TPREL16_HA x ld r3, x@got@tprel@l(r3) R_PPC64_GOT_TPREL16_LO_DS x nop add r3, r3, r13
General-Dynamic-to-Initial-Exec Replacement GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_TPREL64 x
General Dynamic to Local Exec General-Dynamic-to-Local-Exec Initial Relocations Code Sequence Relocation Symbol addis r3, r2, x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x addi r3, r3, x@got@tlsgd@l R_PPC64_GOT_TLSGD16_LO x bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x R_PPC64_REL24 __tls_get_addr nop
General-Dynamic-to-Local-Exec GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_DTPMOD64 x GOT[n+1] R_PPC64_DTPREL64 x
The preceding code and global offset table entries are replaced by the following code, which makes no reference to GOT entries. The GOT entries in can be removed from the GOT by the linker when performing this code transformation. To further optimize the code in , a linker may reschedule the sequence to exploit fusion by generating a sequence that may be fused by Power processors: nop addis r3, r13, x@tprel@ha addi r3, r3, x@tprel@l nop General-Dynamic-to-Local-Exec Replacement Initial Relocations Code Sequence Relocation Symbol nop addis r3, r13, x@tprel@ha R_PPC64_TPREL16_HA x nop addi r3, r3, x@tprel@l R_PPC64_TPREL16_LO x
Local Dynamic to Local Exec Under this TLS linker optimization, the function call is replaced with an equivalent code sequence. However, as shown in the following code examples, the dtprel sequences are left unchanged. Local-Dynamic-to-Local-Exec Initial Relocations Code Sequence Relocation Symbol addis r3, r2, x1@got@tlsld@ha R_PPC64_GOT_TLSLD16_HA x1 addi r3, r3, x1@got@tlsld@l R_PPC64_GOT_TLSLD16_LO x1 bl __tls_get_addr(x1@tlsld) R_PPC64_TLSLD x1 R_PPC64_REL24 __tls_get_addr nop ... addi r9, r3, x1@dtprel R_PPC64_DTPREL16 x1 ... addis r9, r3, x2@dtprel@ha R_PPC64_DTPREL16_HA x2 addi r9, r9, x2@dtprel@l R_PPC64_DTPREL16_LO x2 ... addis r9, r2, x3@got@dtprel@ha R_PPC64_GOT_DTPREL16_HA x3 ld r9, x3@got@dtprel@l(r9) R_PPC64_GOT_DTPREL16_LO_DS x3 add r9, r9, r3
Local-Dynamic-to-Local-Exec GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_DTPMOD64 x1 GOT[n+1] ... GOT[m] R_PPC64_DTPREL64 x3
The preceding code and global offset table entries are replaced by the following code and global offset table entries. Local-Dynamic-to-Local-Exec Replacement Initial Relocations Code Sequence Relocation Symbol nop addis r3, r13, L@tprel@ha R_PPC64_TPREL16_HA link editor generated local symbol nop addi r3, r3, L@tprel@l R_PPC64_TPREL16_LO link editor generated local symbol The linker may prefer to schedule the addis and addi to be adjacent to take advantage of fusion as a microarchitecture optimization opportunity. .. addi r9, r3, x1@dtprel R_PPC64_DTPREL16 x1 .. addis r9, r3, x2@dtprel@ha R_PPC64_DTPREL16_HA x2 addi r9, r9, x2@dtprel@l R_PPC64_DTPREL16_LO x2 ... addis r9, r2, x3@got@dtprel@ha R_PPC64_GOT_DTPREL16_HA x3 ld r9, x3@got@dtprel@l(r9) R_PPC64_GOT_DTPREL16_LO_DS x3 add r9, r9, r3
The GOT[n] and GOT[n+1] entries can be removed by the linker after the code transformation as shown in . Local-Dynamic-to-Local-Exec Replacement GOT Entry Relocations Code Sequence Relocation Symbol GOT[m] R_PPC64_DTPREL64 x3
The local symbol generated by the link editor points to the start of the thread storage block plus 0x7000 bytes. In practice, a section symbol with a suitable offset will be used.
Initial Exec to Local Exec This transformation is only performed by the linker when the symbol is within 2 GB + 28 KB of the thread pointer. Initial-Exec-to-Local-Exec Initial Relocations Code Sequence Relocation Symbol addis r9, r2, x@got@tprel@ha R_PPC64_GOT_TPREL16_HA x ld r9, x@got@tprel@l(r9) R_PPC64_GOT_TPREL16_LO_DS x add r9, r9, x@tls R_PPC64_TLS x
Initial-Exec-to-Local-Exec GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_TPREL64 x
The preceding code and global offset table entries are replaced by the following code and global offset table entries. Initial-Exec-to-Local-Exec Replacement Initial Relocations Code Sequence Relocation Symbol nop addis r9, r13, x@tprel@ha R_PPC64_TPREL16_HA x addi r9, r9, x@tprel@l R_PPC64_TPREL16_LO x
Other sizes and types of thread-local variables may use any of the X-form indexed load or store instructions. shows how to access the contents of a variable using the X-form indexed load and store instructions. Initial-Exec-to-Local-Exec X-form Initial Relocations Code Sequence Relocation Symbol addis r9, r2, x@got@tprel@ha R_PPC64_GOT_TPREL16_HA x ld r9, x@got@tprel@l(r9) R_PPC64_GOT_TPREL16_LO_DS x lbzx r10, r9, x@tls R_PPC64_TLS x addi r10, r10, 1 stbx r10, r9, x@tls R_PPC64_TLS x
Initial-Exec-to-Local-Exec X-form GOT Entry Relocations Code Sequence Relocation Symbol GOT[n] R_PPC64_TPREL64 x
The preceding code and global offset table entries are replaced by the following code and global offset table entries. Initial-Exec-to-Local-Exec X-form Replacement Initial Relocations Code Sequence Relocation Symbol nop addis r9, r13, x@tprel@ha R_PPC64_TPREL16_HA x lbz r10, x@tprel@l(r9) R_PPC64_TPREL16_LO x addi r10, r10, 1 stb r10, x@tprel@l(r9) R_PPC64_TPREL16_LO x
ELF TLS Definitions The result of performing a relocation for a TLS symbol is the module ID and its offset within the TLS block. These are then stored in the GOT. Later, they are obtained by the dynamic linker at run-time and passed to __tls_get_addr( ), which returns the address for the variable for the current thread. For more information, see . For TLS relocations, see .   TLS Relocation Descriptions The following marker relocations tie together instructions in TLS code sequences. They allow the link editor to reliably optimize TLS code. R_PPC64_TLSGD and R_PPC64_TLSLD shall be emitted immediately before their associated __tls_get_addr call relocation. R_PPC64_TLS R_PPC64_TLSGD R_PPC64_TLSLD
System Support Functions and Extensions
Back Chain Systems must provide a back chain by default, and they must include compilers allocating a back chain and system libraries allocating a back chain. Alternate libraries may be supplied in addition to, and beyond, but never instead of those providing a back chain. Code generating and using a back chain shall be the default for compilers, linkers, and library selection.
Nested Functions Nested functions that access their ancestors’ stack frames are entered with r11 initialized to an environment pointer. The environment pointer is typically a copy of the stack pointer for the most recent instance of the nested function's parent's stack frame. When a function pointer to a nested function referencing its outer context is created, an implementation may create a trampoline to load the present environment pointer to r11, followed by an unconditional branch to the function code of the nested function contained in the text segment. When a trampoline is used, a pointer to a nested function is represented by the code address of the trampoline. In some environments, the trampoline code may be created by allocating memory on the data stack, making at least pages containing trampolines executable. In other environments, executable pages may be prohibited in the stack area for security reasons. Alternate implementations, such as creating code stacks for allocating nested function trampolines, may be used. In garbage-collected environments, yet other ways for managing trampolines are available.
Traceback Tables To support debuggers and exception handlers, the 64-bit OpenPOWER ELF V2 ABI defines the use of descriptive debug and unwind information that enables flexible debugging and unwinding of optimized code (such as, for example, DWARF). To support legacy tooling, the OpenPOWER ELF V2 ABI also specifies the use of a traceback table that may provide additional information about functions. describes a minimal set of fields that may, optionally, specify information about a function. Additional fields may be present in a traceback table in accordance with commonly used PowerPC traceback conventions in other environments, but they are not specified in the current ABI definition.
Traceback Table Fields If a traceback table is present, the following fields are mandatory: version Eight-bit field. This defines the type code for the table. The only currently defined value is zero. lang Eight-bit field. This defines the source language for the compiler that generated the code to which this traceback table applies. The default values are as follows: C 0 Fortran 1 Pascal 2 Ada 3 PL/1 4 Basic 5 LISP 6 COBOL 7 Modula2 8 C++ 9 RPG 10 PL.8, PLIX 11 Assembly 12 Java 13 Objective C 14 The codes 0xf–0xfa are reserved. The codes 0xfb–0xff are reserved for IBM. globalink One-bit field. This field is set to 1 if this routine is a special routine used to support the linkage convention: a linkage function including a procedure linkage table function, pointer glue code, a trampoline, or other compiler- or linker-generated functions that stack traceback functions should skip, other than is_eprol functions. For more information, see . These routines have an unusual register usage and stack format. is_eprol One-bit field. This field is set to 1 if this routine is an out-of-line prologue or epilogue function, including a register save or restore function. Stack traceback functions should skip these. For more information, see . These routines have an unusual register usage and stack format. has_tboff One-bit field. This field is set to 1 if the offset of the traceback table from the start of the function is stored in the tb_offset field. int_proc One-bit field. This field is set to 1 if this function is a stackless leaf function that does not have a separate stack frame. has_ctl One-bit field. This field is set to 1 if ctl_info is provided. tocless One-bit field. This field is set to 1 if this function does not have a TOC. For example, a stackless leaf assembly language routine with no references to external objects. fp_present One-bit field. This field is set to 1 if the function uses floating-point processor instructions. log_abort One-bit field. Reserved. int_handl One-bit field. Reserved. name_present One-bit field. This field is set to 1 if the name for the procedure is present following the traceback field, as determined by the name_len and name fields. uses_alloca One-bit field. This field is set to 1 if the procedure performs dynamic stack allocation. To address their local variables, these procedures require a different register to hold the stack pointer value. This register may be chosen by the compiler, and must be indicated by setting the value of the alloc_reg field. cl_dis_inv Three-bit field. Reserved. saves_cr One-bit field. This field indicates whether the CR fields are saved in the CR save word. If traceback tables are used in place of DWARF unwind information, at least all volatile CR fields must be saved in the CR save word. saves_lr One-bit field. This field is set to 1 if the function saves the LR in the LR save doubleword. stores_bc One-bit field. This field is set to 1 if the function saves the back chain (the SP of its caller) in the stack frame header. fixup One-bit field. This field is set to 1 if the link editor replaced the original instruction by a branch instruction to a special fix-up instruction sequence. fp_saved Six-bit field. This field is set to the number of nonvolatile floating-point registers that the function saves. When traceback unwind and debug information is used, the last register saved is always f31. Therefore, for example, a value of 2 in this field indicates that f30 and f31 are saved. has_vec_info One-bit field. This field is set to 1 if the procedure saves nonvolatile vector registers in the Vector Register Save Area, specifies the number of vector parameters, or uses VMX instructions. spare4 One-bit field. Reserved. gpr_saved Six-bit field. This field is set to the number of nonvolatile general registers that the function saves. As with fp_saved, when traceback unwind and debug information is used, the last register saved is always r31. fixedparms Eight-bit field. This field is set to the number of fixed-point parameters. floatparms Seven-bit field. This field is set to the number of floating-point parameters. parmsonstk One-bit field. This field is set to 1 if all of the parameters are placed in the Parameter Save Area.