Object FilesELF HeaderThe file class member of the ELF header identification array,
e_ident[EI_CLASS], identifies the ELF file as 64-bit encoded by holding the
value ELFCLASS64.For a big-endian encoded ELF file, the data encoding member of the
ELF header identification array, e_ident[EI_DATA], holds the value 2,
defined as data encoding ELFDATA2MSB. For a little-endian encoded ELF file,
it holds the value 1, defined as data encoding ELFDATA2LSB.e_ident[EI_CLASS] ELFCLASS64 For all 64-bit implementations.
e_ident[EI_DATA] ELFDATA2MSB For all big-endian implementations.
e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.The ELF header's e_flags member holds bit flags associated with the
file. The 64-bit PowerPC processor family defines the following
flags.E_flags defining the ABI level:0For ELF object files of an unspecified nature.1For the Power ELF V1 ABI using function descriptors. This
ABI is currently only used for big-endian PowerPC
implementations.2For the OpenPOWER ELF V2 ABI using the facilities described
here and including function pointers to directly reference
functions.The ABI version to be used for the ELF header file is specified with
the .abiversion pseudo-op:.abiversion 2Processor identification resides in the ELF header's e_machine
member, and must have the value EM_PPC64, defined as the value 21.Special Sections lists the sections that are used
in the Power Architecture to hold program and control information. It also
shows their types and attributes.
Special SectionsSection NameTypeAttributes.gotSHT_PROGBITSSHF_ALLOC + SHF_WRITE.tocSHT_PROGBITSSHF_ALLOC + SHF_WRITE.pltThe type of the OpenPOWER ABI .plt section is
SHT_NOBITS, not SHT_PROGBITS as on most other processors.SHT_NOBITSSHF_ALLOC + SHF_WRITE.sdataSHT_PROGBITSSHF_ALLOC + SHF_WRITE.sbssSHT_NOBITSSHF_ALLOC + SHF_WRITE.data1SHT_PROGBITSSHF_ALLOC + SHF_WRITE.bss1SHT_NOBITSSHF_ALLOC + SHF_WRITE
Suggested uses of these special sections follow:The .got section may hold the Global Offset Table (GOT). This
section is not normally present in a relocatable object file because it
is linker generated. The linker must ensure that .got is aligned to an
8-byte boundary. In an executable or shared library, it may contain
part or all of the TOC. For more information, see
and
.The .toc section may hold the initialized TOC. The .toc section
must be aligned to an 8-byte boundary. Address elements within .toc
must be aligned to 8-byte boundaries to support linker optimization of
the .toc section. In a relocatable object file, .toc may contain
addresses of objects and functions; in this respect it may be thought
of as a compiler-managed GOT. It may also contain other constants or
variables; in this respect it is like .sdata. In an executable or
shared library, it may contain part or the entirety of the TOC. For
more information, see
,
, and
.The .plt section may hold the procedure linkage table. This
section is not normally present in a relocatable object file because it
is linker generated. Each entry within the .plt section is an 8-byte
address. The linker must ensure that .plt is aligned to an 8-byte
boundary. For more information, see
.The .sdata section may hold initialized small-sized data. For
more information, see
.The .sbss section may hold uninitialized small-sized data.The .data1 section may hold initialized medium-sized data.The .bss1 section may hold uninitialized medium-sized
data.Tools that support this ABI are not required to use these sections.
However, if a tool uses these sections, it must assign the types and
attributes specified in
. Tools are not required to use
the sections precisely as suggested. Relocation information and the code
that refers to it define the actual use of a section.TOCThe TOC is part of the data segment of an executable program.This section describes a common layout of the TOC in an executable
file or shared object. Particular tools are not required to follow the
layout specified here.The TOC region commonly includes data items within the .got, .toc,
.sdata, and .sbss sections. In the medium code model, they can be addressed
with 32-bit signed offsets from the TOC pointer register. The TOC pointer
register typically points to the beginning of the .got section + 0x8000,
which permits a 2 GB TOC with the medium and large code models. The .got
section is typically created by the link editor based on @got relocations.
The .toc section is typically included from relocatable object files
referenced during the link phase.The TOC may straddle the boundary between initialized and
uninitialized data in the data segment. The common order of sections in the
data segment, some of which may be empty, follows:.rodata
.data
.data1
.got
.toc
.sdata
.sbss
.plt
.bss1
.bssThe medium code model is expected to provide a sufficiently large TOC
to provide all data addressing needs of a module with a single TOC.Compilers may generate two-instruction medium code model references
(or, if selected, short displacement one-instruction references) for all
data items that are in the TOC for the object file being compiled. Such
references are relative to the TOC pointer register, r2. (The linker may
optimize two-instruction forms to one instruction forms, replacing a first
instruction of the two instruction form with a nop and rewriting the second
instruction. Consequently, the TOC pointer must be live during the first
and second instruction of a two-instruction reference.)Modules Containing Multiple TOCsThe link editor may create multiple TOCs. In such a case, the
constituent .got, .toc, .sdata, and .sbss sections are conceptually
repeated as necessary, with each TOC typically using a TOC pointer value
of its base plus 0x8000. Any constituent section of type SHT_NOBITS in
any TOC but the last is converted to type SHT_PROGBITS filled with
zeros.When multiple TOCs are present, linking must take care to save,
initialize, and restore TOC pointers within a single module when calling
from one function to a second function using a different TOC pointer
value. Many of the same issues associated with a cross-module call apply
also to calls within a module but using different TOC pointers.Symbol TableSymbol ValuesAn executable file that contains a symbol reference that is to be
resolved dynamically by an associated shared object will have a symbol
table entry for that symbol. This entry will identify the symbol as
undefined by setting the st_shndx member to SHN_UNDEF.The OpenPOWER ABI uses the three most-significant bits in the
symbol st_other field to specify the number of instructionsbytes between a
function's global entry point and local entry point. The global entry
point is used when it is necessary to set up the TOC pointer (r2) for the
function. The local entry point is used when r2 is known to already be
valid for the function. A value of zero in these bits asserts that the
function does not use r2.The values of these three most significant bits of the st_other
field have the following meanings:0The local and global entry points are the same, and the
function has a single entry point with no requirement on r12 or
r2. On return, r2 will contain the same value as at
entry.This value should be used for functions that do not
require the use of a TOC register to access external data. In
particular, functions that do not access data through the TOC
pointer can use a common entry point for the local and global
entry points.If the function is not a leaf function, it must
call subroutines using the R_PPC64_REL24_NOTOC relocation
to indicate that the TOC register is not initialized. In
turn, this may lead to more expensive procedure linkage
table (PLT) stub code than would be necessary if a TOC
register were initialized.1The local and global entry points are the same, and r2
should be treated as caller-saved for local and global
callers.2The local entry point is at one instructionfour bytes past the
global entry point.When called at the global entry point, r12 must be set to
the function entry address. r2 will be set to the TOC base that
this function needs, so it must be preserved and restored by
the caller.When called at the local entry point, r12 is not used and
r2 must already point to the TOC base that this function needs,
and it will be preserved.3The local entry point is at two instructionseight bytes past the
global entry point.When called at the global entry point, r12 must be set to
the function entry address. r2 will be set to the TOC base that
this function needs, so it must be preserved and restored by
the caller.When called at the local entry point, r12 is not used and
r2 must already point to the TOC base that this function needs,
and it will be preserved.4The local entry point is at four instructionssixteen bytes past the
global entry point.When called at the global entry point, r12 must be set to
the function entry address. r2 will be set to the TOC base that
this function needs, so it must be preserved and restored by
the caller.When called at the local entry point, r12 is not used and
r2 must already point to the TOC base that this function needs,
and it will be preserved.5The local entry point is at eight instructionsthirty-two bytes past the
global entry point.When called at the global entry point, r12 must be set to
the function entry address. r2 will be set to the TOC base that
this function needs, so it must be preserved and restored by
the caller.When called at the local entry point, r12 is not used and
r2 must already point to the TOC base that this function needs,
and it will be preserved.6The local entry point is at 16 instructionssixty-four bytes past the
global entry point.When called at the global entry point, r12 must be set to
the function entry address. r2 will be set to the TOC base that
this function needs, so it must be preserved and restored by
the caller.When called at the local entry point, r12 is not used and
r2 must already point to the TOC base that this function needs,
and it will be preserved.7ReservedThe local-entry-point handling field of st_other is generated with
the .localentry pseudo op. The following is an example using the medium
code model: .globl my_func
.type my_func, @function
my_func:
addis r2, r12, (.TOC.-my_func)@ha
addi r2, r2, (.TOC.-my_func)@l
.localentry my_func, .-my_func
... ; function definition
blrFunctions called via symbols with an st_other value of 0 may be
called without a valid TOC pointer in r2. Symbols of functions that
require a local entry with a valid TOC pointer should generate a symbol
with an st_other field value of 2–6 and both local and global entry
points, even if the global entry point will not be used. (In such a case,
the instructions of the global entry setup sequence may optionally be
initialized with TRAP instructions.)The value of st_other is determined
from the .localentry directive as follows: If the .localentry
value is 0, the value of st_other is 0. If the .localentry
value is 1, the value of st_other is 1. Otherwise, the value of
st_other is the logarithm (base 2) of the .localentry value.
For very large programs, a 32-bit offset from
the TOC base may not suffice to reach all function addresses. In this
case, the large program model must be used, and the above sequence is
replaced by: .globl my_func
.type my_func, @function
.quad .TOC.-my_func
my_func:
.reloc ., R_PPC64_ENTRY ; optional
ld r2,-8(r12)
add r2,r2,r12
.localentry my_func, .-my_func
... ; function definition
blrThe linker will resolve .TOC.-my_func to a
64-bit offset stored 8 bytes prior to the global entry point. The
prologue code then forms the absolute address of the TOC base.Optionally, the linker may optimize the
prologue sequence for functions that are within 2GB of the TOC base.
To faciliate this, the compiler may associate an R_PPC64_ENTRY
relocation with the global entry point. Note that this relocation
simply provides a hint, and imposes no obligations on the linker to
optimize the prologue sequence. Nor does the absence of this relocation
forbid the linker from optimizing the prologue sequence.Use of the Small Data AreaFor a data item in the .sdata or .sbss sections, a compiler may
generate short-form one-instruction references. In an executable file or
shared library, such a reference is relative to the address of the TOC
base symbol (which can be obtained from r2 if a TOC pointer is
initialized). A compiler that generates code using the small data area
should provide an option to select the maximum size of objects placed in
the small data area, and a means of disabling any use of the small data
area. When generating code for ELF shared libraries, the small data area
should not be used for default-visibility global objects. This is to
satisfy ELF shared-library symbol interposition rules. That is, an
ordinary global symbol in a shared library may be overridden by a symbol
of the same name defined in the executable or another shared library.
Supporting interposition when using TOC-pointer relative addressing would
require text relocations.Relocation TypesThe relocation entries in a relocatable file are used by the link
editor to transform the contents of that file into an executable file or a
shared object file. The application and result of a relocation are similar
for both. Several relocatable files may be combined into one output file.
The link editor merges the content of the files, sets the value of all
function symbols, and performs relocations.The 64-bit OpenPOWER Architecture uses Elf64_Rela relocation entries
exclusively. A relocation entry may operate upon a halfword, word, or
doubleword. The r_offset member of the relocation entry designates the
first byte of the address affected by the relocation. The subfield of
r_offset affected by a relocation is implicit in the definition of the
applied relocation type. The r_addend member of the relocation entry serves
as the relocation addend, which is described in
for each relocation type.A relocation type defines a set of instructions and calculations
necessary to alter the subfield data of a particular relocation
field.Relocation FieldsThe following relocation fields identify a subfield of an address
affected by a relocation.Bit numbers are shown at the bottom of the boxes. (Only big-endian
bit numbers are shown for space considerations.) Byte numbers are shown
in the top of the boxes; big-endian byte numbers are displayed in the
upper left corners and little-endian in the upper right corners. The byte
order specified in a relocatable file’s ELF header applies to all the
elements of a relocation entry, the relocation field definitions, and
relocation type calculations.In the following figure, doubleword64 specifies a 64-bit field
occupying 8 bytes, the alignment of which is 8 bytes unless otherwise
specified.07162534doubleword64043526170doubleword64 (continued)63In the following figure, word32 specifies a 32-bit field taking up
4 bytes and maintaining 4-byte alignment unless otherwise
indicated.03122130word32031In the following figure, word30 specifies a 30-bit field taking up
bits 0–29 of a word and maintaining 4-byte alignment unless otherwise
indicated.03122130word300293031In the following figure, low24 specifies a 24-bit field taking up
bits 6–29 of a word. The 32-bit
word is 4-byte aligned. The other bits
remain unchanged. A call or unconditional branch instruction is an
example of this field.03122130low24056293031In the following figure, low21 specifies a 21-bit field occupying
the least-significant bits of a word with 4-byte alignment.03122130low210101131In the following figure, low14 specifies a 14-bit field taking up
bits 16–29 and possibly bit 10 (the branch prediction bit) of a word
and maintaining 4-byte alignment. The other bits remain unchanged. A
conditional branch instruction is an example usage.03122130low140101516293031In the following figure, half16 specifies a 16-bit field taking up
two bytes and maintaining 2-byte alignment. The immediate field of an Add
Immediate instruction is an example of this field.0110half160123456789101112131415In the following figure, half16ds is similar to half16, but is
really just 14 bits because the two least-significant bits must be zero
and are not really part of the field. (Used by, for example, the ldu
instruction.) In addition to the use of this relocation field with the DS
forms, half16ds relocations are also used in conjunction with DQ forms.
In those instances, the linker and assembler collaborate to create valid
DQ forms. They raise an error if the specified offset does not meet the
constraints of a valid DQ instruction form displacement.0110half16ds0123456789101112131415
In the following figure, prefix34 specifies a 34-bit field split
between bits 14-31 and 48-63 of two consecutive words. This is used
by many PC-relative load and store instructions.
03122130prefix34 (high 18 bits)013143147566574prefix34 (low 16 bits)32474863
In the following figure, prefix28 specifies a 28-bit field taking up
bits 20-31 and 48-63 of two consecutive words. This is reserved for future use.
03122130prefix28 (high 12 bits)019203147566574prefix28 (low 16 bits)32474863Relocation NotationsThe following notations are used in the relocation table.ARepresents the addend used to compute the value of the
relocatable field.BRepresents the base address at which a shared object file
has been loaded into memory during execution. Generally, a
shared object file is built with a 0 base virtual address, but
the execution address will be different. See Program Header in
the System V ABI for more information about the base
address.GRepresents the address in
the .TOC. at which the address of
the relocation entry’s symbol
plus addend
resides during execution. This
implies the creation of a .got section. For more information,
see
and
.Reference in a calculation to the value G implicitly
creates a GOT entry for the indicated symbol.LRepresents the section
offset or address of the procedure
linkage table entry for the symbol. This implies the creation
of a .plt section if one does not already exist. It also
implies the creation of a procedure linkage table (PLT) entry
for resolving the symbol. For an unresolved symbol, the PLT
entry points to a PLT resolver stub. For a resolved symbol, a
procedure linkage table entry holds the final effective address
of a dynamically resolved symbol (see
).MSimilar to G, except that the address that is stored may
be the address of the procedure linkage table entry for the
symbol.PRepresents the place (section offset or address) of the
storage unit being relocated (computed using r_offset).RRepresents the offset of the symbol within the section in
which the symbol is defined (its section-relative
address).SRepresents the value of the symbol whose index resides in
the relocation entry.+Denotes 64-bit modulus addition.–Denotes 64-bit modulus subtraction.>>Denotes arithmetic right-shifting.#lo(value)Denotes the least-significant 16 bits of the indicated
value. That is:#lo(x) = (x & 0xffff).#hi(value)Denotes bits 16–63 of the indicated value. That
is:#hi(x) = x >> 16#ha(value)Denotes the high adjusted value: bits 16–63 of the
indicated value, compensating for #lo( ) being treated as a
signed number. That is:#ha(x) = (x + 0x8000) >> 16#high(value)Denotes bits 16–31 of the indicated value. That
is:#high(x) = (x >> 16) & 0xffff#higha(value)Denotes the high adjusted value: bits 16–31
of the
indicated value, compensating for #lo( ) being treated as a
signed number. That is:#higha(x) = ((x + 0x8000) >> 16) &
0xffff#higher(value)Denotes bits 32–47 of the indicated value. That
is:#higher(x) = (x >> 32) & 0xffff#highera(value)Denotes the higher adjusted value: bits 32–47
of the
indicated value, compensating for #lo( ) being treated as a
signed number. That is:#highera(x) = ((x + 0x8000) >> 32) &
0xffff#highest(value)Denotes bits 48–63 of the indicated value. That
is:#highest(x) = x >> 48#highesta(value)Denotes the highest adjusted value: bits 48–63
of the
indicated value, compensating for #lo( ) being treated as a
signed number. That is:#highesta(x) = (x + 0x8000) >> 48#lo34(value)
Denotes the least-significant 34 bits of the indicated
64-bit value. That is:
#lo34(x) = x & 0x3ffffffff
#lo28(value)
Denotes the least-significant 28 bits of the indicated
64-bit value. That is:
#lo28(x) = x & 0x00fffffff
#hi30(value)
Denotes bits 34-63 of the indicated 64-bit value. That is:
#hi30(x) = x >> 34
#ha30(value)
Denotes bits 34-63 of the indicated 64-bit value,
compensating for #lo34( ) being treated as a signed
number. That is:
#ha30(x) = (x + 0x200000000) >> 34
#higher34(value)Denotes bits 34–49 of the indicated value. That
is:#higher34(x) = (x >> 34) & 0xffff#highera34(value)Denotes the higher adjusted value: bits 34–49
of the
indicated value, compensating for #lo34( ) being treated as a
signed number. That is:#highera34(x) = ((x + 0x200000000) >> 34) &
0xffff#highest34(value)Denotes bits 50–63 of the indicated value. That
is:#highest34(x) = x >> 50#highesta34(value)Denotes the highest adjusted value: bits 50–63
of the
indicated value, compensating for #lo34( ) being treated as a
signed number. That is:#highesta34(x) = (x + 0x200000000) >> 50TPThe value of the thread pointer in general-purpose
register r13.TLS_TP_OFFSETThe constant value 0x7000, representing the offset (in
bytes) of the location that the thread pointer is initialized
to point to, relative to the start of the thread local storage
for the first initially available module.TCB_LENGTHThe constant value 0x8, representing the length of the
thread control block (TCB) in bytes.tcbRepresents the base address of the TCB.tcb = (tp – (TLS_TP_OFFSET + TCB_LENGTH))dtvRepresents the base address of the dynamic thread vector
(DTV).dtv = tcb[0]dtpmodRepresents the load module index of the load module that
contains the definition of the symbol being relocated and is
used to index the DTV.dtprelRepresents the offset of the symbol being relocated
relative to the value of dtv[dtpmod].dtv[dtpmod] + dtprel = (S + A)rel16dx
Represents a 16-bit signed offset split across three
fields, as required by the addpcis instruction.
tprelRepresents the offset of the symbol being relocated
relative to the TP.tp + tprel = (S + A)@got@tlsgdAllocates two contiguous entries in the GOT to hold a
tls_index structure, with values dtpmod and dtprel, and
computes the offset from .TOC. of the first entry.If n is the offset computed:GOT[n] = dtpmodGOT[n + 1] = dtprelThe call to __tls_get_addr ( ) happens as:__tls_get_addr ((tls_index *) &GOT[n])@got@tlsldAllocates two contiguous entries in the GOT to hold a
tls_index structure, with values dtpmod and zero, and computes
the offset from .TOC. of the first entry.If n is the offset computed:GOT[n] = dtpmodGOT[n + 1] = 0The call to __tls_get_addr ( ) happens as:__tls_get_addr ((tls_index *) &GOT[n])@got@tprelAllocates an entry in the GOT with value tprel, and
computes the offset from .TOC. of the entry.If n is the offset computed:GOT[n] = tprelThe value of tprel is loaded into a register from the
location (GOT + n) to be used in an r2 form instruction.Relocations flagged with an asterisk(*) will
trigger a relocation failure if the value computed does
not fit in the field specified.Relocation Types TableThe following rules apply to the relocation types defined in
:For relocation types in which the names contain 14 or 16, the
upper 49 bits of the value computed before shifting must all be the
same. For relocation types in which the names contain 24, the upper
39 bits of the value computed before shifting must all be the same.
For relocation types in which the names contain 14 or 24, the low 2
bits of the value computed before shifting must all be zero.The relocation types whose Field column entry contains an
asterisk (*) are subject to failure if the value computed does not
fit in the allocated bits.Relocations that refer to half16ds (56–66, 87–88, 91–92,
95–96, and 101–102) are to be used to direct the linker to look
at the underlying instruction and treat the field as a DS or DQ
field. ABI-compliant tools should give an error for attempts to
relocate an address to a value that is not divisible by 4.
Relocation TableRelocation NameValueFieldExpressionRelocation values 8, 9, 12, 13, 18, 23, 32,
and 247 are not used. This is to maintain a
correspondence to the relocation values used by the
32-bit PowerPC ELF ABI.
R_PPC64_NONE0nonenoneR_PPC64_ADDR321word32*S + AR_PPC64_ADDR242low24*(S + A) >> 2R_PPC64_ADDR163half16*S + AR_PPC64_ADDR16_LO4half16#lo(S + A)R_PPC64_ADDR16_HI5half16*#hi(S + A)R_PPC64_ADDR16_HA6half16*#ha(S + A)R_PPC64_ADDR147low14*(S + A) >> 2R_PPC64_REL2410low24*(S + A – P) >> 2R_PPC64_REL1411low14*(S + A – P) >> 2R_PPC64_GOT1614half16*G – .TOC.R_PPC64_GOT16_LO15half16#lo(G – .TOC.)R_PPC64_GOT16_HI16half16*#hi(G – .TOC.)R_PPC64_GOT16_HA17half16*#ha(G – .TOC.)R_PPC64_COPY19variesSee
.R_PPC64_GLOB_DAT20doubleword64S + AR_PPC64_JMP_SLOT21doubleword64See
.R_PPC64_RELATIVE22doubleword64B + AR_PPC64_UADDR3224word32*S + AR_PPC64_UADDR1625half16*S + AR_PPC64_REL3226word32*S + A – PR_PPC64_PLT3227word32*LR_PPC64_PLTREL3228word32*L – PR_PPC64_PLT16_LO29half16#lo(L – .TOC.)R_PPC64_PLT16_HI30half16*#hi(L – .TOC.)R_PPC64_PLT16_HA31half16*#ha(L – .TOC.)R_PPC64_SECTOFF33half16*R + AR_PPC64_SECTOFF_LO34half16#lo(R + A)R_PPC64_SECTOFF_HI35half16*#hi(R + A)R_PPC64_SECTOFF_HA36half16*#ha(R + A)R_PPC64_REL3037word30(S + A – P) >> 2R_PPC64_ADDR6438doubleword64S + AR_PPC64_ADDR16_HIGHER39half16#higher(S + A)R_PPC64_ADDR16_HIGHERA40half16#highera(S + A)R_PPC64_ADDR16_HIGHEST41half16#highest(S + A)R_PPC64_ADDR16_HIGHESTA42half16#highesta(S + A)R_PPC64_UADDR6443doubleword64S + AR_PPC64_REL6444doubleword64S + A – PR_PPC64_PLT6445doubleword64LR_PPC64_PLTREL6446doubleword64L – PR_PPC64_TOC1647half16*S + A – .TOC.R_PPC64_TOC16_LO48half16#lo(S + A – .TOC.)R_PPC64_TOC16_HI49half16*#hi(S + A – .TOC.)R_PPC64_TOC16_HA50half16*#ha(S + A – .TOC.)R_PPC64_TOC51doubleword64.TOC.R_PPC64_PLTGOT1652half16*MR_PPC64_PLTGOT16_LO53half16#lo(M)R_PPC64_PLTGOT16_HI54half16*#hi(M)R_PPC64_PLTGOT16_HA55half16*#ha(M)R_PPC64_ADDR16_DS56half16ds*(S + A) >> 2R_PPC64_ADDR16_LO_DS57half16ds#lo(S + A) >> 2R_PPC64_GOT16_DS58half16ds*(G – .TOC.) >>
2R_PPC64_GOT16_LO_DS59half16ds#lo(G – .TOC.) >>
2R_PPC64_PLT16_LO_DS60half16ds#lo(L – .TOC.) >>
2R_PPC64_SECTOFF_DS61half16ds*(R + A) >> 2R_PPC64_SECTOFF_LO_DS62half16ds#lo(R + A) >> 2R_PPC64_TOC16_DS63half16ds*(S + A – .TOC.) >> 2R_PPC64_TOC16_LO_DS64half16ds#lo(S + A – .TOC.) >> 2R_PPC64_PLTGOT16_DS65half16ds*M >> 2R_PPC64_PLTGOT16_LO_DS66half16ds#lo(M) >> 2R_PPC64_TLS67nonenoneR_PPC64_DTPMOD6468doubleword64@dtpmodR_PPC64_TPREL1669half16*@tprelR_PPC64_TPREL16_LO70half16#lo(@tprel)R_PPC64_TPREL16_HI71half16*#hi(@tprel)R_PPC64_TPREL16_HA72half16*#ha(@tprel)R_PPC64_TPREL6473doubleword64@tprelR_PPC64_DTPREL1674half16*@dtprelR_PPC64_DTPREL16_LO75half16#lo(@dtprel)R_PPC64_DTPREL16_HI76half16*#hi(@dtprel)R_PPC64_DTPREL16_HA77half16*#ha(@dtprel)R_PPC64_DTPREL6478doubleword64@dtprelR_PPC64_GOT_TLSGD1679half16*@got@tlsgdR_PPC64_GOT_TLSGD16_LO80half16#lo(@got@tlsgd)R_PPC64_GOT_TLSGD16_HI81half16*#hi(@got@tlsgd)R_PPC64_GOT_TLSGD16_HA82half16*#ha(@got@tlsgd)R_PPC64_GOT_TLSLD1683half16*@got@tlsldR_PPC64_GOT_TLSLD16_LO84half16#lo(@got@tlsld)R_PPC64_GOT_TLSLD16_HI85half16*#hi(@got@tlsld)R_PPC64_GOT_TLSLD16_HA86half16*#ha(@got@tlsld)R_PPC64_GOT_TPREL16_DS87half16ds*@got@tprelR_PPC64_GOT_TPREL16_LO_DS88half16ds#lo(@got@tprel)R_PPC64_GOT_TPREL16_HI89half16*#hi(@got@tprel)R_PPC64_GOT_TPREL16_HA90half16*#ha(@got@tprel)R_PPC64_GOT_DTPREL16_DS91half16ds*@got@dtprelR_PPC64_GOT_DTPREL16_LO_DS92half16ds#lo(@got@dtprel)R_PPC64_GOT_DTPREL16_HI93half16*#hi(@got@dtprel)R_PPC64_GOT_DTPREL16_HA94half16*#ha(@got@dtprel)R_PPC64_TPREL16_DS95half16ds*@tprelR_PPC64_TPREL16_LO_DS96half16ds#lo(@tprel)R_PPC64_TPREL16_HIGHER97half16#higher(@tprel)R_PPC64_TPREL16_HIGHERA98half16#highera(@tprel)R_PPC64_TPREL16_HIGHEST99half16#highest(@tprel)R_PPC64_TPREL16_HIGHESTA100half16#highesta(@tprel)R_PPC64_DTPREL16_DS101half16ds*@dtprelR_PPC64_DTPREL16_LO_DS102half16ds#lo(@dtprel)R_PPC64_DTPREL16_HIGHER103half16#higher(@dtprel)R_PPC64_DTPREL16_HIGHERA104half16#highera(@dtprel)R_PPC64_DTPREL16_HIGHEST105half16#highest(@dtprel)R_PPC64_DTPREL16_HIGHESTA106half16#highesta(@dtprel)R_PPC64_TLSGD107nonenoneR_PPC64_TLSLD108nonenoneR_PPC64_TOCSAVE109nonenoneR_PPC64_ADDR16_HIGH110half16#high(S + A)R_PPC64_ADDR16_HIGHA111half16#higha(S + A)R_PPC64_TPREL16_HIGH112half16
#high(@tprel)
R_PPC64_TPREL16_HIGHA113half16
#higha(@tprel)
R_PPC64_DTPREL16_HIGH114half16
#high(@dtprel)
R_PPC64_DTPREL16_HIGHA115half16
#higha(@dtprel)
R_PPC64_REL24_NOTOC116low24*(S + A – P) >> 2R_PPC64_ADDR64_LOCAL117doubleword64S + A (See
.)R_PPC64_ENTRY118nonenoneR_PPC64_PLTSEQ119nonenoneR_PPC64_PLTCALL120nonenoneR_PPC64_PLTSEQ_NOTOC121nonenoneR_PPC64_PLTCALL_NOTOC122nonenoneR_PPC64_PCREL_OPT123nonenoneR_PPC64_D34128prefix34*S + AR_PPC64_D34_LO129prefix34#lo34(S + A)R_PPC64_D34_HI30130prefix34#hi30(S + A)R_PPC64_D34_HA30131prefix34#ha30(S + A)R_PPC64_PCREL34132prefix34*S + A – PR_PPC64_GOT_PCREL34133prefix34*G – PR_PPC64_PLT_PCREL34134prefix34*L – PR_PPC64_PLT_PCREL34_NOTOC135prefix34*L – PR_PPC64_ADDR16_HIGHER34136half16#higher34(S + A)R_PPC64_ADDR16_HIGHERA34137half16#highera34(S + A)R_PPC64_ADDR16_HIGHEST34138half16#highest34(S + A)R_PPC64_ADDR16_HIGHESTA34139half16#highesta34(S + A)R_PPC64_REL16_HIGHER34140half16#higher34(S + A – P)R_PPC64_REL16_HIGHERA34141half16#highera34(S + A – P)R_PPC64_REL16_HIGHEST34142half16#highest34(S + A – P)R_PPC64_REL16_HIGHESTA34143half16#highesta34(S + A – P)R_PPC64_D28144prefix28*S + AR_PPC64_PCREL28145prefix28*S + A – PR_PPC64_TPREL34146prefix34*@tprelR_PPC64_DTPREL34147prefix34*@dtprelR_PPC64_GOT_TLSGD34148prefix34*@got@tlsgdR_PPC64_GOT_TLSLD34149prefix34*@got@tlsldR_PPC64_GOT_TPREL34150prefix34*@got@tprelR_PPC64_GOT_DTPREL34151prefix34*@got@dtprelR_PPC64_REL16_HIGH240half16#high(S + A – P)R_PPC64_REL16_HIGHA241half16#higha(S + A – P)R_PPC64_REL16_HIGHER242half16#higher(S + A – P)R_PPC64_REL16_HIGHERA243half16#highera(S + A – P)R_PPC64_REL16_HIGHEST244half16#highest(S + A – P)R_PPC64_REL16_HIGHESTA245half16#highesta(S + A – P)R_PPC64_REL16DX_HA246rel16dx*#ha(S + A – P)R_PPC64_IRELATIVE248doubleword64See
.R_PPC64_REL16249half16*S + A – PR_PPC64_REL16_LO250half16#lo(S + A – P)R_PPC64_REL16_HI251half16*#hi(S + A – P)R_PPC64_REL16_HA252half16*#ha(S + A – P)R_PPC64_GNU_VTINHERIT253R_PPC64_GNU_VTENTRY254
Relocation DescriptionsThe following list describes relocations that can require special
handling or description.R_PPC64_GOT16*These relocation types are similar to the corresponding
R_PPC64_ADDR16* types. However, they refer to the address of the symbol’s
GOT entry and instruct the link editor to build a GOT.R_PPC64_PLTGOT16*These relocation types are similar to the corresponding
R_PPC64_GOT16* types. However, if the link editor
cannot determine the actual value of the symbol, the
GOT entry may contain the address of an entry in the procedure linkage
table. The link editor creates that entry in the procedure linkage table
and stores that address in the GOT entry. This permits lazy resolution of
function symbols at run time. If the link editor
can determine the value of the symbol, it stores that
value in the corresponding GOT entry. The link editor may generate an
R_PPC64_GLOB_DAT relocation as usual.R_PPC64_PLTREL32, R_PPC64_PLTREL64These relocations indicate that reference to a symbol should be
resolved through a call to the symbol’s procedure linkage table entry.
Additionally, it instructs the link editor to build a procedure linkage
table for the executable or shared object if one is not created.R_PPC64_COPYThis relocation type is created by the link editor for dynamic
linking. Its offset member refers to a location in a writable segment.
The symbol table index specifies a symbol that should exist both in the
current relocatable file and in a shared object file. During execution,
the dynamic linker copies data associated with the shared object’s symbol
to the location specified by the offset.R_PPC64_GLOB_DATThis relocation type allows determination of the correspondence
between symbols and GOT entries. It is similar to R_PPC64_ADDR64.
However, it sets a GOT entry to the address of the specified
symbol.R_PPC64_JMP_SLOTThis relocation type is created by the link editor for dynamic
linking. Its offset member gives the location of a procedure linkage
table (PLT) entry. The dynamic linker modifies the PLT entry to transfer
control to the designated symbol’s address (see
).R_PPC64_RELATIVEThis relocation type is created by the link editor for dynamic
linking. Its offset member gives a location within a shared object that
contains a value representing a relative address. The corresponding
virtual address is computed by the dynamic linker. It adds the virtual
address at which the shared object was loaded to the relative address.
Relocation entries for this type must specify 0 for the symbol table
index.R_PPC64_IRELATIVEThe link editor creates this relocation type for dynamic linking.
Its addend member specifies the global entry-point location of a resolver
function returning a function pointer. It is used to implement the
STT_GNU_IFUNC framework. The resolver is called, and the returned pointer
copied into the location specified by the relocation offset
member.R_PPC64_TLS, R_PPC64_TLSGD, R_PPC64_TLSLDUsed as markers on thread local storage (TLS) code sequences, these
relocations tie the entire sequence with a particular TLS symbol. For
more information, see
.R_PPC64_TOCSAVEThis relocation type indicates a position where a TOC save may be
inserted in the function to avoid a TOC save as part of the PLT stub
code. A nop can be emitted by a compiler in a function's prologue code. A
link editor can change it to a TOC pointer save instruction. This marker
relocation is placed on the prologue nop and on nops after bl
instructions, with the symbol plus addend pointing to the prologue nop.
If the link editor uses the prologue to save r2, it may omit r2 saves in
the PLT call stub code emitted for calls marked by
R_PPC64_TOCSAVE.R_PPC64_UADDR*These relocation types are the same as the corresponding
R_PPC64_ADDR* types, except that the datum to be relocated is allowed to
be unaligned.R_PPC64_ADDR64_LOCALWhen a separate local entry point exists, this relocation type is
used to initialize a memory location with the address of that local entry
point.R_PPC64_REL24_NOTOCThis relocation type is used to specify a function call where the
TOC pointer is not initialized. It is similar to R_PPC64_REL24 in that it
specifies a symbol to be resolved. If the
symbol resolves to a function that requires a TOC pointer (as
determined by st_other bits) then a link editor must arrange for the
call to be via the global entry point of the called function.
AnyHowever, if the symbol is resolved by
inserting a call to a PLT stub code, the PLT stub code must
not rely on the presence of
a valid TOC base address in TOC
register r2 to reference
the PLT function table.R_PPC64_ENTRYThis relocation type may optionally be
associated with a global entry point. See
for discussion of its
use.R_PPC64_PLTSEQ, R_PPC64_PLTCALL
These relocations mark the instruction as being part of an inline
PLT call sequence in a function where r2 is a valid TOC pointer.
R_PPC64_PLTCALL is used to mark the call instruction, while
R_PPC64_PLTSEQ is used on other instructions in the sequence that
don't have PLT relocations. All instructions in a given sequence
shall have relocations with the same symbol and addend. Note that
R_PPC64_PLTCALL also implicitly marks the nop or TOC-restoring
instruction immediately following the call instruction.
R_PPC64_PLTSEQ_NOTOC,
R_PPC64_PLTCALL_NOTOC
These relocations are like the corresponding R_PPC64_PLTSEQ and
R_PPC64_PLTCALL relocations, but are used in functions where r2 is
not a valid TOC pointer. All instructions in the sequence shall use
_NOTOC variant relocations.
R_PPC64_PCREL_OPT
This relocation specifies that the instruction at
r_offset and the instruction at
r_offset + r_addend may be
optimized by the linker; the compiler must guarantee that register
lifetimes are such that the optimization is safe. In both code
sequences where this relocation is valid, the first instruction also
has another relocation at r_offset. The
R_PPC64_PCREL_OPT entry occurs immediately after that relocation in
the table of relocations.
See for more details.
Assembler SyntaxThe offset from .TOC. in the GOT where the value of the symbol is
stored is given by the assembly syntax symbol@got. The value of the
symbol alone is the address of the variable named symbol.For example:addis r3, r2,x@got@ha
ld r3,x@got@l(r3)Although the Power ISA only defines 16-bit displacements, many TOCs
(and hence a GOT) are larger then 64 KB but fit within 2 GB, which can be
addressed with 32-bit offsets from r2. Therefore, this ABI defines a
simple syntax for 32-bit offsets to the GOT.The syntaxes SYMBOL@got@ha, SYMBOL@got@h, and SYMBOL@got@l refer to
the high adjusted, high, and low parts of the GOT offset. (For an
explanation of the meaning of “high adjusted,” see
). SYMBOL@got@ha corresponds to
bits 32–63 of the offset within the global offset table with adjustment
for the sign extension of the low-order offset bits. SYMBOL@got@l
corresponds to the 16 low-order bits of the offset within the global
offset table.The syntax SYMBOL@toc refers to the value (SYMBOL – .TOC.), where
.TOC. represents the TOC base for the current object file. This provides
the address of the variable whose name is SYMBOL as an offset from the
TOC base.As with the GOT, the syntaxes SYMBOL@toc@ha, SYMBOL@toc@h, and
SYMBOL@toc@l refer to the high adjusted, high, and low parts of the TOC
offset.The syntax SYMBOL@got@plt may be used to refer to the offset in the
TOC of a procedure linkage table entry stored in the global offset table.
The corresponding syntaxes SYMBOL@got@plt@ha, SYMBOL@got@plt@h, and
SYMBOL@got@plt@l are also defined.
If X is a variable stored in the TOC,
then X@got is the offset within the TOC of a doubleword whose
value is X@toc.The special symbol .TOC. is used to represent the TOC base for the
current object file.The following code might appear in a PIC code setup sequence to
compute the distance from a function entry point to the TOC base:addis 2,12,.TOC.-func@ha
addi 2,2,.TOC.-func@lThe syntax
SYMBOL@localentry refers to the value of the local
entry point associated with a function symbol. It can be used to
initialize a memory word with the address of the local entry point as
follows:.quad func@localentryAssembler- and Linker-Mediated Executable OptimizationTo optimize object code, the assembler and linker may rewrite object
code to implement the function call and return conventions and access to
global and thread-local data. It is the responsibility of compilers and
programmers to generate assembly programs and objects that conform to the
requirements as indicated in this section.Function CallUnless the bl instruction is
annotated with an R_PPC64_REL24_NOTOC relocation,the
static linker must modify a nop instruction after a bl function
call to restore the TOC pointer in r2 from 24(r1) when an external symbol
that may use the TOC may be called, as in
.
A function must contain a
nop slot after a bl instruction to an external symbol.Reference OptimizationReferences to the GOT may be optimized by rewriting indirect
reference code to replace the reference by an address computation. This
transformation is only performed by the linker when the symbol is known
to be local to the module.Displacement Optimization for TOC Pointer Relative
AccessesAssemblers and linkers
may optimize TOC reference code that consists of two
instructions with equivalent code when offset@ha is 0.TOC reference code:addis rt, r2, offset@ha
lwz rt, offset@l(rt)Equivalent code:NOP
lwz rt, offset(r2)Compilers and programmers
must ensure that r2 is live at the actual data access
point associated with extended displacement addressing.TOC Pointer UsageTo enable linker-based optimizations when global data is accessed,
the TOC pointer needs to be available for dereference at the point of all
uses of values derived from the TOC pointer in conjunction with the @l
operator. This property is used by the linker to optimize TOC pointer
accesses. In addition, all reaching definitions for a TOC-pointer-derived
access must compute the same definition.In some implementations, non-ABI-compliant code may be processed by
providing additional linker options; for example, linker options
disabling linker optimization. However, this behavior in support of
non-ABI-compliant code is not guaranteed to be portable and supported in
all systems.Compliant example addis r4, r2, mysym@toc@ha
b target
...
addis r4, r2, mysym@toc@ha
target:
addi r4, r4, mysym@toc@l
...Non-compliant example li r4, 0 ; #d1
b target
...
addis r4, r2, mysym@toc@ha ; #d2
target:
addi r4, r4, mysym@toc@l ; incompatible definitions #d1 and #d2 reach this
...Table Jump SequencesSome linkers may rewrite jump table sequences, as described in
. For example, linkers may
rewrite address references created using GOT-indirect loads and bl+4
sequences to use TOC-relative address computation.Displacement Optimization for PC-Relative Accesses
Compilers and assembly programmers must assume that references to
extern data having unrestricted visibility may be satisfied by a
dynamically linked object, and must therefore use PC-relative
GOT-indirect addressing for such references. A linker may
determine that such a reference is satisfied during static linking
and replace the reference with direct PC-relative addressing.
For example:
pld r10, symbol@got@pcrel
lxv vs1, 0(r10)The previous sequence may be replaced by:plxv vs1, symbol@pcrel
nop
However, this optimization is not universally safe, since it
changes the value of r10 following the data reference. The
compiler or programmer must ensure that the value of r10 is not
used between the two instructions nor subsequently used,
and communicate a request for this optimization
by placing an R_PPC64_PCREL_OPT relocation on the first instruction
in the sequence that provides the offset to the second instruction
in the sequence.
FusionCode generation in compilers, linkers, and by programmers should
use a destructive sequence of two sequential instructions consisting of
first an addis followed by a second instruction using a D form
instruction to create or load from a 32-bit offset from a register to
enable hardware fusion whenever possible:addis r4, r3, upper
<lbz,lhz,lwz,ld> r4, lower(r4)
addis r4, r3, upper
addi r4, r4, lowerIt is encouraged that assemblers provide pseudo-ops to facilitate
such code generation with a single assembler mnemonic.Thread-Local Linker OptimizationsAdditional code rewriting is performed by the linker in conjunction
with the use of thread-local storage described in
.Thread Local Storage ABIThe
ELF Handling for Thread-Local Storage document is the
authoritative TLS ABI specification that defines the context in which
information in the TLS section of this Power Architecture 64-bit ELF V2 ABI
must be viewed. For information about how to access this document, see
. To
maintain congruence with that document, in this section the term module
refers to an executable or shared object since both are treated
similarly.TLS BackgroundMost C/C++ implementations support (as an extension to earlier
versions of the language) the keyword __thread to be used as a
storage-class specifier in variable declarations and definitions of data
objects with thread storage duration. (The 2011 ISO C Standard uses
_Thread_local as the keyword, while the 2011 ISO C++ Standard uses
thread_local.) A variable declared in this manner is automatically
allocated local to each thread. Its lifetime is defined to be the entire
execution of the thread. Any initialization value is assigned once before
thread startup.TLS Runtime HandlingA thread-local variable is completely identified by the module in
which it is defined, along with the offset of the variable relative to
the start of the TLS block for the module. A module is referenced by its
index (an integer starting with 1, which is assigned by the run-time
environment) into the dynamic thread vector (DTV). The offset of the
variable is kept in the st_value field of the TLS variable’s symbol table
entry.The TLS data structures follow variant I of the ELF TLS ABI. For
the 64-bit PowerPC Architecture, the specific organization of the data
structures is as follows.The thread control block (TCB) consists of the DTV, which is an
8-byte pointer. An extended TCB may have additional
implementation-specific fields; these fields are located
before the DTV pointer because the addresses are
computed as negative offsets from the TCB address. The fields must never
be rearranged for any reason.The current glibc extended TCB is:typedef struct {
/* Reservation for HWCAP data. */
unsigned int hwcap2;
unsigned int hwcap; /* not used in LE ABI */
/* Indicate if HTM capable (ISA 2.07). */
int tm_capable;
int tm_pad;/* Reservation for AT_PLATFORM data. */
uint32_t __unused;
uint32_t at_platform;
/* Reservation for dynamic system optimizer ABI. */
uintptr_t dso_slot2;
uintptr_t dso_slot1;
/* Reservation for tar register (ISA 2.07). */
uintptr_t tar_save;
/* GCC split stack support. */
void *__private_ss;
/* Reservation for the event-based branching ABI. */
uintptr_t ebb_handler;
uintptr_t ebb_ctx_pointer;
uintptr_t ebb_reserved1;
uintptr_t ebb_reserved2;
uintptr_t pointer_guard;
/* Reservation for stack guard */
uintptr_t stack_guard;
/* DTV pointer */
dtv_t *dtv;
} tcbhead_t;Modules that will not be unloaded will be present at startup time;
the TLS blocks for these are created consecutively and immediately follow
the TCB. The offset of the TLS block of an initially available module
from the TCB remains fixed after program start.The tlsoffset(m) values for a module with index m, where m ranges
from 1–M, M being the total number of modules, are computed as follows:tlsoffset(1) = round(16, align(1))
tlsoffset(m + 1) = round(tlsoffset(m) + tlssize(m), align(m + 1))The function round( ) returns its first argument rounded up to
the next multiple of its second argument:round(x, y) = y × ceiling(x / y)The function ceiling( ) returns the smallest integer greater
than or equal to its argument, where n is an integer satisfying:
n – 1 < x ≤ n:ceiling(x) = nIn the case of dynamic shared objects (DSO), TLS blocks are
allocated on an as-needed basis, with the details of allocation
abstracted away by the __tls_get_addr( ) function, which is used to
retrieve the address of any TLS variable.The prototype for the __tls_get_addr( ) function, is defined as
follows.typedef struct
{
unsigned long int ti_module;
unsigned long int ti_offset;
} tls_index;
extern void *__tls_get_addr (tls_index *ti);The thread pointer (TP) is held in r13 and is used to access the
TCB. The TP is initialized to point 0x7000 bytes past the end of the TCB.
The TP offset allows for efficient addressing of the TCB and up to 4 KB
–
8 B of other thread library information (placed before the TCB). shows the region of memory
before and after the TCB that can be efficiently addressed by the
TP.Each DTV pointer points 0x8000 bytes past the start of each TLS
block. (For implementation reasons, the actual value stored in the DTV
may point to the start of a TLS block. However, values returned by
accessor functions will be offset by 0x8000 bytes.) This offset allows
the first 64 KB of each block to be addressed from a DTV pointer using
fewer machine instructions.TLS[m] denotes the TLS block for the module with index m. DTV[m]
denotes the DTV pointer for the module with index m.TLS Access ModelsTLS data access is categorized into the following models:General Dynamic TLS ModelLocal Dynamic TLS ModelInitial Exec TLS ModelLocal Exec TLS ModelExamples for each access model are provided in the following TLS
Model subsections.General Dynamic TLS ModelThis specification provides examples based on the medium
code model, which is the default for the ELF V2 ABI.Given the following code fragment, to determine the address of a
thread-local variable x, the __tls_get_addr( ) function is called with one
parameter. That parameter is a pointer to a data object of type
tls_index. Note that different
code generation is used for PC-relative addressing, but the GOT
entry relocations are identical.extern __thread unsigned int x;
&x;
PC-Relative General Dynamic Initial RelocationsCode SequenceRelocationSymbolpla r3, x@got@tlsgd@pcrelR_PPC64_GOT_TLSGD34xbl __tls_get_addr@notoc(x@tlsgd)R_PPC64_TLSGDxR_PPC64_REL24_NOTOC__tls_get_addr
General Dynamic GOT Entry RelocationsCode SequenceRelocationSymbolGOT[n]R_PPC64_DTPMOD64xGOT[n+1]R_PPC64_DTPREL64x
The relocation specifier @got@tlsgd causes the link editor to
create a data object of type tls_index in the GOT. The address of this
data object is loaded into the first argument register with the addis and
addi instruction, and a standard function call is made. Notice that the
bl instruction has two relocations: the R_PPC64_TLSGD tying it to the
argument setup instructions and the R_PPC64_REL24 or R_PPC64_REL24_NOTOC specifying the call
destination.Local Dynamic TLS ModelFor the Local Dynamic TLS Model, three different relocation
sequences may be used, depending on the size of the thread storage block
offset to the variable. For the following code sequence, a different
relocation sequence is used for each variable.static __thread unsigned int x1;
static __thread unsigned int x2;
static __thread unsigned int x3;
&x1;
&x2;
&x3;
Local Dynamic GOT Entry RelocationsCode SequenceRelocationSymbolGOT[n]R_PPC64_DTPMOD64x1GOT[n+1]0GOT[m]R_PPC64_DTPREL64x3
The relocation specifier @got@tlsld in the first instruction causes
the link editor to generate a tls_index data object in the GOT with a
fixed 0 offset. The following code assumes that x1 is in the first 64 KB
of the thread storage block. The x2 symbol is not within the first 64 KB
but is within the first 2 GB, and x3 is outside the 2 GB area. To load
the values of x1, x2, and x3 instead of their addresses, replace the
latter part of
with the following code
sequence.
For PC-relative addressing, replace the latter part of
with the following code
sequence.
Local Dynamic Relocations with Values LoadedCode SequenceRelocationSymbol...lwz r0, x1@dtprel(r3)R_PPC64_DTPREL16x1...plwz r0, x2@dtprel@pcrel(r3)R_PPC64_DTPREL34x2...pld r9, x3@got@dtprel@pcrel(r3)R_PPC64_GOT_DTPREL34x3lwzx r0, r3, r9
Initial Exec TLS ModelGiven the following code fragment, the relocation sequence in
is used for the Initial Exec
TLS Model:extern __thread unsigned int x;
&x;
The relocation specifier @got@tprel in the first instruction causes
the link editor to generate a GOT entry with a relocation that the
dynamic linker will replace with the offset for x relative to the thread
pointer. The relocation specifier x@tls tells the assembler to use an r13
form of the instruction. That is, add r9,r9,r13 in this case, and tag the
instruction with a relocation that indicates it belongs to a TLS
sequence. This relocation specifier can be used later by the link editor
when optimizing TLS code.
To read the contents of the variable instead of calculating its
address, the add r9, r9, x@tls instruction
in
might be replaced with lwzx r0, r9, x@tls.
The add r9, r9, x@tls@pcrel instruction in might likewise be replaced
with lwzx r0, r9, x@tls@pcrel.
Note that both the x@tls and x@tls@pcrel assembly forms are
annotated with R_PPC64_TLS relocations. To distinguish
between the two, the second of these has a field value
displaced by one byte from the beginning of the instruction.
Local Exec TLS ModelGiven the following code fragment, three different relocation
sequences may be used, depending on the size of the offset to the
variable. The sequence in
handles offsets within 60 KB
relative to the end of the TCB (where r13 points 28 KB past the end of
the TCB, which is immediately before the first TLS block). The sequence
in
handles offsets past 60 KB and
less than 2 GB + 28 KB relative to the end of the TCB. The third sequence
is identical to the Initial Exec sequence shown in
.static __thread unsigned int x;
&x; illustrates which sequence is
used.
Local Exec Initial Relocations (Sequence 1)Code SequenceRelocationSymboladdi r9, r13, x1@tprelR_PPC_TPREL16x
PC-Relative Local Exec Initial Relocations (Sequences 1
and 2)Code SequenceRelocationSymbolpaddi r9, r13, x1@tprelR_PPC_TPREL34x
TLS Link Editor OptimizationsIn some cases, the link editor may be able to optimize TLS code
sequences, provided the compiler emits code sequences as
described.The following TLS link editor transformations are provided as
optimizations to convert between specific TLS access models:General Dynamic to Initial ExecGeneral Dynamic to Local ExecLocal Dynamic to Local ExecInitial Exec to Local Exec through describe TLS link editor
transformations using a TOC addressing model, and through describe TLS link editor
transformations using a PC-relative addressing model.
General Dynamic to Initial Exec (TOC)
The preceding code and global offset table entries are replaced by
the following code, which makes no reference to GOT entries. The GOT
entries in
can be removed from the GOT by
the linker when performing this code transformation.To further optimize the code in
, a linker may reschedule the
sequence to exploit fusion by generating a sequence that may be fused
by Power processors:nop
addis r3, r13, x@tprel@ha
addi r3, r3, x@tprel@l
nop
Local Dynamic to Local Exec
(TOC)Under this TLS linker optimization, the function call is replaced
with an equivalent code sequence. However, as shown in the following code
examples, the dtprel sequences are left unchanged.
The preceding code and global offset table entries are replaced by
the following code and global offset table entries.
Local-Dynamic-to-Local-Exec Replacement Initial
Relocations (TOC)Code SequenceRelocationSymbolnopaddis r3, r13, L@tprel@haR_PPC64_TPREL16_HAlink editor generated local symbolnopaddi r3, r3, L@tprel@lR_PPC64_TPREL16_LOlink editor generated local symbolThe linker may prefer to schedule the addis and addi
to be adjacent to take advantage of fusion as a
microarchitecture optimization opportunity...addi r9, r3, x1@dtprelR_PPC64_DTPREL16x1..addis r9, r3, x2@dtprel@haR_PPC64_DTPREL16_HAx2addi r9, r9, x2@dtprel@lR_PPC64_DTPREL16_LOx2...addis r9, r2, x3@got@dtprel@haR_PPC64_GOT_DTPREL16_HAx3ld r9, x3@got@dtprel@l(r9)R_PPC64_GOT_DTPREL16_LO_DSx3add r9, r9, r3
The GOT[n] and GOT[n+1] entries can be removed by the linker after
the code transformation as shown in
.
The local symbol generated by the link editor points to the start
of the thread storage block plus 0x7000 bytes. In practice, a section
symbol with a suitable offset will be used.Initial Exec to Local Exec
(TOC)This transformation is only performed by the linker when the symbol
is within 2 GB + 28 KB of the thread pointer.
Other sizes and types of thread-local variables may use any of the
X-form indexed load or store instructions. shows how to access the
contents of a variable using the X-form indexed load and store
instructions.
The preceding code and global offset table entries are replaced by
the following code, which makes no reference to GOT entries. The GOT
entries in can be
removed from the GOT by the linker when performing this code
transformation.
Local Dynamic to Local Exec (PC-Relative)Under this TLS linker optimization, the function call is replaced
with an equivalent code sequence. However, as shown in the following code
examples, the dtprel sequences are left unchanged.
The preceding code and global offset table entries are replaced by
the following code. The global offset table entry GOT[n] can be
removed by the linker.
ELF TLS DefinitionsThe result of performing a relocation for a TLS symbol is the
module ID and its offset within the TLS block. These are then stored in
the GOT. Later, they are obtained by the dynamic linker at run-time and
passed to __tls_get_addr( ), which returns the address for the variable
for the current thread.For more information, see
. For TLS relocations, see
.TLS Relocation DescriptionsThe following marker relocations tie together instructions in TLS
code sequences. They allow the link editor to reliably optimize TLS code.
R_PPC64_TLSGD and R_PPC64_TLSLD shall be emitted immediately before their
associated __tls_get_addr call relocation.R_PPC64_TLSR_PPC64_TLSGDR_PPC64_TLSLDSystem Support Functions and ExtensionsBack ChainSystems must provide a back chain by default, and they must include
compilers allocating a back chain and system libraries allocating a back
chain. Alternate libraries may be supplied in addition to, and beyond,
but never instead of those providing a back chain. Code generating and
using a back chain shall be the default for compilers, linkers, and
library selection.Nested FunctionsNested functions that access their ancestors’ stack frames are
entered with r11 initialized to an environment pointer. The environment
pointer is typically a copy of the stack pointer for the most recent
instance of the nested function's parent's stack frame. When a function
pointer to a nested function referencing its outer context is created, an
implementation may create a trampoline to load the present environment
pointer to r11, followed by an unconditional branch to the function code
of the nested function contained in the text segment.When a trampoline is used, a pointer to a nested function is
represented by the code address of the trampoline.In some environments, the trampoline code may be created by
allocating memory on the data stack, making at least pages containing
trampolines executable. In other environments, executable pages may be
prohibited in the stack area for security reasons.Alternate implementations, such as creating code stacks for
allocating nested function trampolines, may be used. In garbage-collected
environments, yet other ways for managing trampolines are
available.Traceback TablesTo support debuggers and exception handlers, the 64-bit
OpenPOWER ELF V2 ABI defines the use of descriptive
debug and unwind information that enables flexible debugging and
unwinding of optimized code (such as, for example, DWARF).To support legacy tooling, the
OpenPOWER ELF V2 ABI also specifies the use of a
traceback table that may provide additional information about
functions. describes a minimal set of
fields that may, optionally, specify information about a function.
Additional fields may be present in a traceback table in accordance with
commonly used PowerPC traceback conventions in other environments, but
they are not specified in the current ABI definition.Traceback Table FieldsIf a traceback table is present, the following fields are
mandatory:versionEight-bit field. This defines the type code for the
table. The only currently defined value is zero.langEight-bit field. This defines the source language for the
compiler that generated the code to which this traceback table
applies. The default values are as follows:C0Fortran1Pascal2Ada3PL/14Basic5LISP6COBOL7Modula28C++9RPG10PL.8, PLIX11Assembly12Java13Objective C14The codes 0xf–0xfa are reserved. The codes
0xfb–0xff are reserved for IBM.globalinkOne-bit field. This field is set to 1 if this routine is
a special routine used to support the linkage convention: a
linkage function including a procedure linkage table function,
pointer glue code, a trampoline, or other compiler- or
linker-generated functions that stack traceback functions
should skip, other than is_eprol functions. For more
information, see
. These routines have
an unusual register usage and stack format.is_eprolOne-bit field. This field is set to 1 if this routine is
an out-of-line prologue or epilogue function, including a
register save or restore function. Stack traceback functions
should skip these. For more information, see
. These routines have
an unusual register usage and stack format.has_tboffOne-bit field. This field is set to 1 if the offset of
the traceback table from the start of the function is stored in
the tb_offset field.int_procOne-bit field. This field is set to 1 if this function is
a stackless leaf function that does not have a separate stack
frame.has_ctlOne-bit field. This field is set to 1 if ctl_info is
provided.toclessOne-bit field. This field is set to 1 if this function
does not have a TOC. For example, a stackless leaf assembly
language routine with no references to external objects.
fp_presentOne-bit field. This field is set to 1 if the function
uses floating-point processor instructions.log_abortOne-bit field. Reserved.int_handlOne-bit field. Reserved.name_presentOne-bit field. This field is set to 1 if the name for the
procedure is present following the traceback field, as
determined by the name_len and name fields.uses_allocaOne-bit field. This field is set to 1 if the procedure
performs dynamic stack allocation. To address their local
variables, these procedures require a different register to
hold the stack pointer value. This register may be chosen by
the compiler, and must be indicated by setting the value of the
alloc_reg field.cl_dis_invThree-bit field. Reserved.saves_crOne-bit field. This field indicates whether the CR fields
are saved in the CR save word. If traceback tables are used in
place of DWARF unwind information, at least all volatile CR
fields must be saved in the CR save word.saves_lrOne-bit field. This field is set to 1 if the function
saves the LR in the LR save doubleword.stores_bcOne-bit field. This field is set to 1 if the function
saves the back chain (the SP of its caller) in the stack frame
header.fixupOne-bit field. This field is set to 1 if the link editor
replaced the original instruction by a branch instruction to a
special fix-up instruction sequence.fp_savedSix-bit field. This field is set to the number of
nonvolatile floating-point registers that the function saves.
When traceback unwind and debug information is used, the last
register saved is always f31. Therefore, for example, a value
of 2 in this field indicates that f30 and f31 are saved.has_vec_infoOne-bit field. This field is set to 1 if the procedure
saves nonvolatile vector registers in the Vector Register Save
Area, specifies the number of vector parameters, or uses VMX
instructions.spare4One-bit field. Reserved.gpr_savedSix-bit field. This field is set to the number of
nonvolatile general registers that the function saves. As with
fp_saved, when traceback unwind and debug information is used,
the last register saved is always r31.fixedparmsEight-bit field. This field is set to the number of
fixed-point parameters.floatparmsSeven-bit field. This field is set to the number of
floating-point parameters.parmsonstkOne-bit field. This field is set to 1 if all of the
parameters are placed in the Parameter Save Area.