From 6fb7b160a67abfbcd7c6c7aed94b2c4ea55d6d12 Mon Sep 17 00:00:00 2001 From: Bill Schmidt Date: Sat, 28 Apr 2018 16:54:10 -0500 Subject: [PATCH] Changes for third draft (1.5c) of PC-relative addressing changes. Signed-off-by: Bill Schmidt --- specification/bk_main.xml | 13 +- specification/ch_2.xml | 491 ++++++++++++++++++++++++++++++++++---- specification/ch_3.xml | 477 +++--------------------------------- 3 files changed, 500 insertions(+), 481 deletions(-) diff --git a/specification/bk_main.xml b/specification/bk_main.xml index 9fd664f..fc2cddb 100644 --- a/specification/bk_main.xml +++ b/specification/bk_main.xml @@ -57,7 +57,7 @@ Freescale Semiconductor, Inc - Revision 1.5b draft + Revision 1.5c draft OpenPOWER @@ -93,6 +93,17 @@ + + 2018-04-28 + + + + Revision 1.5c: PC-relative addressing third + draft. + + + + 2018-04-13 diff --git a/specification/ch_2.xml b/specification/ch_2.xml index 4f1160e..f3aebeb 100644 --- a/specification/ch_2.xml +++ b/specification/ch_2.xml @@ -4064,6 +4064,426 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> requirements. Some tools may not work with alternate calling sequences and conventions. +
+ Function Call Linkage Protocols + + The compiler (or assembly programmer) and linker cooperate to make + function calls as efficient as possible. Different protocols are + required depending on whether a call is local, whether the caller + and/or callee use a TOC pointer in r2 for code or data accesses + (see ), and whether the + caller and/or callee guarantee to preserve r2. A local + function call is one where the callee is known and visible within + the unit of code being compiled or assembled. A function that + uses a TOC pointer always has a separate local entry point (see + ), + and preserves r2 when called via its local entry point. See for information about encoding + this information in the symbol table entries of functions. + + + summarizes the + protocol requirements for external function calls, and + summarizes the + protocol requirements for local function calls. Each entry in these + tables is further described in the referenced section. + Note that + this ABI does not define protocols where the caller does not use + a TOC pointer, but does preserve r2. It is most efficient when + such functions are always leaf procedures. It is not forbidden for + such a function to call another function, but in this case it is + up to the caller to save and restore r2 around each call. + + + + Protocols for External Function Calls + + + + + + + + + + + + Caller + + + + + Callee + + + + + PLT stub + + + + + nop needed? + + + + + Relocation + + + + + Section link + + + + + + + + + Uses TOC + + + + + Any + + + + + r2 save + + + + + Yes + + + + + R_PPC64_REL24 + + + + + + + + + + + + Does not use TOC, does not preserve r2 + + + + + Any + + + + + No r2 save + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + +
+ + Protocols for Local Function Calls + + + + + + + + + + + + Caller + + + + + Callee + + + + + Call method + + + + + nop needed? + + + + + Relocation + + + + + Section link + + + + + + + + + Uses TOC + + + + + Uses TOC + + + + + Local + + + + + No + + + + + R_PPC64_REL24 + + + + + + + + + + + + Does not use TOC, preserves r2 + + + + + Local + + + + + No + + + + + R_PPC64_REL24 + + + + + + + + + + + + Does not use TOC, does not preserve r2 + + + + + r2 save stub + + + + + Yes + + + + + R_PPC64_REL24 + + + + + + + + + + + + Does not use TOC, does not preserve r2 + + + + + Uses TOC + + + + + r12 setup stub + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + + + Does not use TOC + + + + + Local + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + +
+
+ External Call, Caller Uses TOC + + When a function that uses a TOC pointer makes any call to an + external function, the compiler generates a nop instruction after the + bl instruction for the call. The linker generates a procedure + linkage table (PLT) stub that saves r2 and replaces the nop + instruction with a restore of r2. (The save of r2 may be omitted + from the PLT stub if the R_PPC64_TOCSAVE relocation is used; see + .) If the callee requires + a TOC, the PLT stub also includes code to place the callee's global + entry point into r12. See for a full description of PLT stubs. + +
+
+ External Call, Caller Does Not Use TOC, Caller Does Not Preserve + r2 + + When a function that does not use a TOC pointer and does not preserve + r2 makes any call to an + external function, the compiler does not generate a nop instruction + after the bl instruction for the call. Instead, the compiler + annotates the bl instruction with an R_PPC64_REL24_NOTOC + relocation. The linker generates a PLT stub that does not include + a save of r2. If the callee requires a TOC, the PLT stub also + includes code to place the callee's global entry point into r12. + +
+
+ Local Call, Caller Uses TOC, Callee Preserves r2 + + When a function that uses a TOC pointer makes a local call to a + function that also preserves r2, the compiler generates a direct call + to the function's local entry point, and does not generate a nop + instruction after the call. + +
+
+ Local Call, Caller Uses TOC, Callee Does Not Preserve r2 + + When a function that uses a TOC pointer makes a local call to a + function + that does not preserve r2, the compiler generates a nop instruction + after the call. The linker generates a PLT stub that saves r2, but + does not include code to place the callee's global entry point into + r12, and replaces the nop instruction with a restore of r2. (The + save of r2 may be omitted from the PLT stub if the R_PPC64_TOCSAVE + relocation is used; see .) + +
+
+ Local Call, Caller Does Not Preserve r2, Callee Uses TOC + + When a function that does not use a TOC and does not preserve r2 + makes a local call to + a function that requires a TOC pointer, the compiler does not + generate a nop instruction after the bl instruction for the call. + The linker generates a PLT stub that does not include a save of r2, + but does include code to place the callee's global entry point into + r12. The compiler annotates the bl instruction with an + R_PPC64_REL24_NOTOC relocation. + +
+
+ Local Call, Caller Does Not Preserve r2, Callee Does Not Use + TOC + + When a function that does not use a TOC and does not preserve r2 + makes a local call to + a function that does not require a TOC pointer, the compiler + generates a direct call to the function's local entry point, and + does not generate a nop instruction after the call. The compiler + annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation. + +
+
Registers Programs and compilers may freely use all registers except those @@ -4210,9 +4630,9 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Nonvolatile Register r2 is nonvolatile with respect to calls - between functions in the same compilation unit when the caller requires a TOC - pointer. It is saved and restored by code inserted + between most functions + in the same compilation unit. It is saved and restored by + code inserted by the linker resolving a call to an external function. For more information, see and or Volatile Register r2 is volatile and available for use in a - function whose symbol table entry contains an st_other - field wherein the three most-significant bits have a value - of 001. See + function that does not use a TOC pointer and that does + not preserve r2. See . @@ -4416,8 +4835,9 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> appropriate TOC save and restore code. If the function is called from the same module as the callee, the callee must normally preserve the value of r2. - However, if the callee's symbol table - entry is flagged to indicate the callee does not preserve r2, the + If the callee function is called from + a function in the same compilation unit as the callee, and the + callee does not preserve r2, the caller is responsible for saving and restoring the TOC pointer if it needs it. (See @@ -6380,7 +6800,7 @@ plxv v1, symbol@pcrel By using PC-relative GOT-indirect - addressing (for shared data or very large span from code to data): + addressing (for shared data): @@ -6418,9 +6838,7 @@ lvx v1, 0, r12 Code Models Compilers may provide different code models depending on the expected size of the TOC and the size of the entire executable or - shared library. Assuming that the - TOC pointer is used to address data and/or text, the following - considerations apply: + shared library. Small code model: The TOC is accessed using 16-bit offsets @@ -6464,25 +6882,11 @@ lvx v1, 0, r12 sections is provided in . - PC-relative addressing may be used in either the small or the - medium code model, and is identical for both. Accesses to + PC-relative addressing may be used with the medium code model. + Accesses to module-local code and data objects use PC-relative addressing with up to 34-bit offsets. Position-independent code uses PC-relative - GOT-indirect addressing to access other objects in the binary. - If PC-relative addressing span is insufficient to reach any data - item, that access must either be made relative to the TOC - pointer, or a PC-relative indexed form instruction must be used - for the access. PC-relative indexed form instructions provide - up to 64 bits of offset from the current instruction address. - [To discuss: I'm deliberately leaving this flexible for now. - Any concerns? It appears we will probably not see a - load-high-immediate-32 sort of instruction in P10, so we won't - be able to define those kinds of relocs yet.] - - - When linking objects that contain PC-relative relocations, the - linker should attempt to place the .got section near the text - sections. + GOT-indirect addressing to access shared objects.
@@ -6513,11 +6917,16 @@ lvx v1, 0, r12 value. Function pointers shared between modules shall always use the global entry point to specify the address of a function. When a linker causes control to transfer to a global entry point - of a function that requires a TOC - pointer, + of a function that also has a local entry + point, it must insert a glue code sequence that loads r12 with the global - entry-point address. Code at the global entry point can assume that - register r12 points to the GEP. + entry-point address. Code at the global entry point of a function that also has a local entry + point can assume that + register r12 points to the GEP. + However, code at the global entry point of a function that does not + have a separate local entry point cannot make any assumptions about + the values of either r2 or 12. Addresses between the global and local entry points must not be branch targets, either for function entry or referenced by program logic of the function, because a linker may rewrite the code sequence @@ -7479,10 +7888,6 @@ nop . - - For a function call in a function that does not preserve r2, the nop in - need not be generated. - For indirect function calls, the address of the function to be called is placed in r12 and the CTR register. A bctrl instruction is used to perform the indirect branch as shown in @@ -7717,8 +8122,9 @@ bctrl When the callee is in the same compilation unit and is guaranteed to preserve r2. - In both cases, the bl instruction must be marked with an - R_PPC64_REL24_NOTOC relocation. + In the first case, the bl instruction must be marked with an + R_PPC64_REL24_NOTOC relocation. See . For calls to functions resolved at runtime, the linker must generate stub code to load the function address from the PLT. @@ -8651,8 +9057,11 @@ addi r3,r1,p ; R3 = new data area following parameter save area.. - [Ignorant question to discuss: Are there any impacts to unwinding from - new r2 preservation rules?] + When unwinding, care must be taken to restore the TOC pointer r2 if and + only if it has been saved. It is recommended that the unwinder reads the + instruction at the return address in the link register and restores r2 + if and only if that instruction is an explicit restore of r2, i.e., + ld r2,24(r1). diff --git a/specification/ch_3.xml b/specification/ch_3.xml index b03e27b..6c64091 100644 --- a/specification/ch_3.xml +++ b/specification/ch_3.xml @@ -270,11 +270,6 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. - - [To discuss: Alan, is it appropriate to make any adjustments here - in the presence of PC-relative addressing, to get any sections closer - to .text, or are we as ideal as we can get already?] - The medium code model is expected to provide a sufficiently large TOC to provide all data addressing needs of a module with a single TOC. Compilers may generate two-instruction medium code model references @@ -491,427 +486,6 @@ my_func: optimize the prologue sequence. Nor does the absence of this relocation forbid the linker from optimizing the prologue sequence. -
- Function Call Linkage Protocols - - The compiler (or assembly programmer) and linker cooperate to make - function calls as efficient as possible. Different protocols are - required depending on whether a call is local (caller and callee in - the same compilation unit), whether the caller requires r2 to be - preserved, and whether the callee promises to preserve r2. The - "st_other bits" in the caller's and callee's symbol table entries, - described in , are used to - determine information about r2 preservation requirements. - - - A function that does not require a TOC pointer may have its - st_other bits set to 0 or 1, and its local and global entry points - are the same. If its st_other bits are 0, it preserves r2; if - its st_other bits are 1, it does not promise to do so. It is best - that a function with st_other bits set to 0 does not contain any - function calls; see the Note for st_other 0 in - . - - - summarizes the - protocol requirements for external function calls, and - summarizes the - protocol requirements for local function calls. Each entry in these - tables is further described in the referenced section. - - - Protocols for External Function Calls - - - - - - - - - - - - st_other bits - - - - - PLT stub - - - - - nop needed? - - - - - Relocation - - - - - Section link - - - - - - - Caller - - - - - Callee - - - - - - - - - 1 - - - - - 0–6 - - - - - No r2 save - - - - - No - - - - - R_PPC64_REL24_NOTOC - - - - - - - - - - - - 2–6 - - - - - 0–6 - - - - - r2 save - - - - - Yes - - - - - N/A - - - - - - - - - - -
- - Protocols for Local Function Calls - - - - - - - - - - - - st_other bits - - - - - Call method - - - - - nop needed? - - - - - Relocation - - - - - Section link - - - - - - - Caller - - - - - Callee - - - - - - - - - 1 - - - - - 0–1 - - - - - Local - - - - - No - - - - - R_PPC64_REL24_NOTOC - - - - - - - - - - - - 2–6 - - - - - r12 setup stub - - - - - No - - - - - R_PPC64_REL24_NOTOC - - - - - - - - - - - - 2–6 - - - - - 0 - - - - - Local - - - - - No - - - - - R_PPC64_REL24_NOTOC - - - - - - - - - - - - 1 - - - - - r2 save stub - - - - - Yes - - - - - N/A - - - - - - - - - - - - 2–6 - - - - - Local - - - - - No - - - - - R_PPC64_REL24_NOTOC - - - - - - - - - - -
-
- External Call, Preserving Caller - - When a function that preserves r2 makes any call to an external - function, the compiler generates a nop instruction after the bl - instruction for the call. The linker generates a procedure linkage - table (PLT) stub that saves r2, and replaces the nop instruction with - a restore of r2. If the callee requires a TOC, the PLT stub also - includes code to place the callee's global entry point into r12. - See for a full - description of PLT stubs. - -
-
- External Call, Nonpreserving Caller - - When a function that does not preserve r2 makes any call to an - external function, the compiler does not generate a nop instruction - after the bl instruction for the call. Instead, the compiler - annotates the bl instruction with an R_PPC64_REL24_NOTOC - relocation. The linker generates a PLT stub that does not include - a save of r2. If the callee requires a TOC, the PLT stub also - includes code to place the callee's global entry point into r12. - -
-
- Local Call, Nonpreserving Caller, Callee Needs No TOC - - When a function that does not preserve r2 makes a local call to - a function that does not require a TOC pointer, the compiler - generates a direct call to the function's local entry point, and - does not generate a nop instruction after the call. The compiler - annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation. - -
-
- Local Call, Nonpreserving Caller, Callee Requires TOC - - When a function that does not preserve r2 makes a local call to - a function that requires a TOC pointer, the compiler does not - generate a nop instruction after the bl instruction for the call. - The linker generates a PLT stub that does not include a save of r2, - but does include code to place the callee's global entry point into - r12. - -
-
- Local Call, Preserving Caller, Preserving Callee - - When a function that preserves r2 makes a local call to a function - that also preserves r2, the compiler generates a direct call to the - function's local entry point, and does not generate a nop - instruction after the call. The compiler annotates the bl - instruction with an R_PPC64_REL24_NOTOC relocation. - -
-
- Local Call, Preserving Caller, Nonpreserving Callee - - When a function that preserves r2 makes a local call to a function - that does not preserve r2, the compiler generates a nop instruction - after the call. The linker generates a PLT stub that saves r2, but - does not include code to place the callee's global entry point into - r12, and replaces the nop instruction with a restore of r2. - -
-
Use of the Small Data Area For a data item in the .sdata or .sbss sections, a compiler may @@ -5374,7 +4948,8 @@ my_func: specifies a symbol to be resolved. If the symbol resolves to a function that requires a TOC pointer (as determined by st_other bits) then a link editor must arrange for the - call to be via the entry point of the called function. Any + call to be via the global entry point of the called function. + Any However, if the symbol is resolved by inserting a call to a PLT stub code, the PLT stub code must not rely on the presence of @@ -5388,10 +4963,12 @@ my_func: use. R_PPC64_PCREL_OPT - This relocation type requests that the annotated load or store + This relocation type requests that the annotated instruction and its immediately following instruction be optimized by - the linker when the referenced symbol can be statically resolved. - See for details. + the linker when the referenced symbol can be statically resolved, + or when a more efficient PC-relative sequence can be chosen. + See and + for details.
@@ -5455,16 +5032,15 @@ addi 2,2,.TOC.-func@l requirements as indicated in this section.
Function Call - When present, + Unless the bl instruction is + annotated with an R_PPC64_REL24_NOTOC relocation, the static linker must modify a nop instruction after a bl function call to restore the TOC pointer in r2 from 24(r1) when an external symbol that may use the TOC may be called, as in . - A function must contain a - nop slot after a bl instruction to an external symbol - unless the bl instruction is annotated with - an R_PPC64_REL24_NOTOC relocation. + A function must contain a + nop slot after a bl instruction to an external symbol.
Reference Optimization @@ -5559,11 +5135,34 @@ nop in the sequence. The compiler or programmer must further ensure that the two instructions are not separated by intervening instructions. +
+
+ Optimization of Masked Load/Store Sequences + + PC-relative forms of the pmlxv and pmstxv instructions have a + 28-bit offset, which is too small to guarantee that the offset + will not overflow when relocated within a medium code model + binary. Compilers should not directly generate PC-relative forms + of these instructions, but may instead generate a short sequence + that can be optimized by a linker. For example: + + paddi r12,symbol@pcrel +pmlxvx v1,r10,r12,VRM,MC,P,0 + The previous sequence may be replaced by: + dnop +pmlxv v1,symbol@pcrel(r10),VRM,MC,P,1 + + when the linker determines that the offset from the current + instruction address to symbol's address will fit in 28 bits. + - [To discuss: A possible alternative, due to Alan, is to allow the - code to separate but emit "pld".."lvx;nop" and optimize to - "dnop".."plxv". In this case the PCREL_OPT should be placed on - both groups of insns. Should we pursue?] + Again, this optimization is not universally safe, since it changes + the value of r12 following the data reference. The compiler or + programmer must ensure that the value of r12 is not subsequently + used, and communicate a request for this optimization by placing + an R_PPC64_PCREL_OPT relocation on the first instruction in the + sequence. The compiler or programmer must further ensure that the + two instructions are not separated by intervening instructions.