diff --git a/specification/bk_main.xml b/specification/bk_main.xml
index 71c611b..9fd664f 100644
--- a/specification/bk_main.xml
+++ b/specification/bk_main.xml
@@ -57,7 +57,7 @@
Freescale Semiconductor, Inc
- Revision 1.5 draft
+ Revision 1.5b draftOpenPOWER
@@ -93,6 +93,17 @@
+
+ 2018-04-13
+
+
+
+ Revision 1.5b: PC-relative addressing second
+ draft.
+
+
+
+ 2018-03-14
diff --git a/specification/ch_1.xml b/specification/ch_1.xml
index 4ac5519..1316397 100644
--- a/specification/ch_1.xml
+++ b/specification/ch_1.xml
@@ -179,4 +179,8 @@
+
+ Changes from release 1.4
+ TBD
+
diff --git a/specification/ch_2.xml b/specification/ch_2.xml
index 7b8cbc2..5668293 100644
--- a/specification/ch_2.xml
+++ b/specification/ch_2.xml
@@ -4045,70 +4045,6 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
-
- Global Data Addressing Models
- This specification provides for two global data
- addressing models. The traditional addressing model, which we will call
- "TOC-based," relies on a dedicated table-of-contents (TOC) pointer to
- obtain the addresses of global data. PowerISA version 3.1 introduces new
- "PC-relative" instructions that can be used to obtain the addresses of
- global data relative to the current instruction address (CIA). Code that
- is targeted to run on hardware compliant with PowerISA 3.1 may make use of
- this capability with a "PC-relative" addressing model.
- Each compilation unit must adhere entirely to
- one addressing model or the other. However, it is expressly possible to
- link TOC-based and PC-relative compilation units into a single
- executable, or to dynamically link from a compilation unit with one
- addressing model to a compilation unit with the other addressing model.
- In particular, a PC-relative compilation unit may be linked with an
- existing TOC-based library. Note that a "compilation unit" may consist of
- hand-written assembly code as well as high-level source code.
- Compilers and other tools performing
- link-time optimizations that repackage functions into different
- compilation units must not mix PC-relative and TOC-based functions in
- the same compilation unit. [To discuss: This could be permitted, but
- the value is unclear and it would be likely to spawn occasional
- linker bugs.] Similarly, programmers should not be allowed to
- specify a single function in a TOC-based compilation unit to use the
- PC-relative addressing model or vice versa; for example, using GCC's
- "#pragma target" syntax. [To discuss: How should this be recorded and
- communicated? Perhaps add to e_flags in the ELF header for module
- objects only? We can communicate the need for PC-relative PLT stubs
- to the linker on calls with a reloc, so the linker may not need this,
- but perhaps other tools will?]
- Details of the two addressing models will be
- provided throughout this specification. However, a brief description
- of each is in order.
-
- TOC-Based Addressing Model
- In the traditional TOC-based addressing model,
- each function uses register r2 (see ) to access global memory. A variety
- of techniques, known as TOC-relative, TOC-indirect, GOT-relative, etc.,
- may be used to address the global data, but all these techniques use the
- TOC pointer r2 as part of the data reference.
- With the cooperation of the linker, each
- function in a TOC-based compilation unit is responsible for the
- establishment and maintenance of its own TOC pointer. All functions
- within a compilation unit have the same TOC pointer, so local function
- calls may assume it does not change. An external function call may be
- resolved to a function in a shared object having a different TOC
- pointer, so a caller in a TOC-based compilation unit must save its TOC
- pointer prior to making a call outside the compilation unit, and restore
- its value upon return before the TOC pointer may be used to access global
- data.
-
-
- PC-Relative Addressing Model
- A function in a PC-relative compilation unit
- has no TOC pointer. All accesses to global data are made relative to
- the current instruction address. Since functions in TOC-based
- compilation units are responsible for establishment and maintenance
- of their own TOC pointers, register r2 may be used freely within a
- PC-relative compilation unit, with no need to save or restore the
- register when modifying it.
-
- Function Calling SequenceThe standard sequence for function calls is outlined in this section.
@@ -4273,22 +4209,25 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Nonvolatile
- In a TOC-based
- compilation unit, register r2 is nonvolatile with
- respect to calls between functions in the same compilation
- unit. It is saved and restored by code inserted by the linker
- resolving a call to an external function. For more
- information, see .
+ Register r2 is nonvolatile with respect to calls
+ between functions in the same compilation unit when the caller requires a TOC
+ pointer. It is saved and restored by code inserted
+ by the linker resolving a call to an external function. For
+ more information, see and . or
Volatile
- Register r2 is volatile and available for use in
- PC-relative compilation units.
+ Register r2 is volatile and available for use in a
+ function whose symbol table entry contains an st_other
+ field wherein the three most-significant bits have a value
+ of 001. See
+ .
- TOC pointer for
- TOC-based compilation units.
+ TOC pointer.
@@ -4460,8 +4399,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
TOC Pointer
- Usage (TOC-Based Compilation Units
- Only)
+ Usage
As described in
, the TOC pointer, r2, is
commonly initialized by the global function entry point when a function
@@ -4476,14 +4414,19 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
dynamic linker. For references through function pointers, it is the
compiler's or assembler programmer's responsibility to insert
appropriate TOC save and restore code. If the function is called from
- the same module as the callee, the callee must preserve the value of
- r2. (See
- for a description of function
- entry conventions.)
- When a function calls another function, the TOC pointer must have
- a legal value pointing to the TOC base, which may be initialized as
- described in
- .
+ the same module as the callee, the callee must normally preserve the value of r2.
+ However, if the callee's symbol table
+ entry is flagged to indicate the callee does not preserve r2, the
+ caller is responsible for saving and restoring the TOC pointer if it
+ needs it. (See
+ for more information.)
+ When a function calls another function that requires a TOC pointer, the TOC
+ pointer must have a legal value pointing to the TOC base, which may be
+ initialized as described in .When global data is accessed, the TOC pointer must be available
for dereference at the point of all uses of values derived from the TOC
pointer in conjunction with the @l operator. This property is used by
@@ -4513,12 +4456,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
context.
- When a function is entered through its global entry point,
+ When a function that requires a
+ TOC pointer is entered through its global entry point,
register r12 contains the entry-point address. For more
information, see the description of dual entry points in
- and
-
- .
+
+ and .
@@ -5200,16 +5143,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
is volatile over a function call.
TOC Pointer Doubleword
- If a function in a TOC-based
- compilation unit changes the value of the TOC pointer
- register, it shall first save it in the TOC pointer doubleword.
- The TOC pointer doubleword is reserved
- for future use for functions in a PC-relative compilation
- unit. [To discuss: This has implications for alloca, as if we
- reserve it for future use, then the TOC pointer doubleword must be
- copied during a dynamic allocation operation. I suspect it is
- better to suffer that slight penalty rarely in order to have the
- flexibility to use this for another future purpose.]
+ If a function changes the value of the TOC pointer register,
+ it shall first save it in the TOC pointer doubleword.Optional Save Areas
@@ -6250,6 +6185,20 @@ s6 - 72 (stored)
When instructions hold relative addresses, a program library can be
loaded at various positions in virtual memory and is referred to as a
position-independent code model.
+ When generating code for PowerISA version 3.1
+ or above, this specification provides two ways to address non-local data
+ and text. The historical method relies on a dedicated table-of-contents
+ (TOC) pointer to obtain such addresses. PowerISA version 3.1 introduces
+ new "PC-relative" instructions that can be used to obtain such
+ addresses relative to the current instruction address (CIA). Both
+ methods may be used in the same executable, dynamically shared
+ object (DSO), object file, or even in the same function. If a
+ function does not require a TOC pointer for addressing, it is not required
+ to establish this pointer in register r2, and may choose not to preserve
+ register r2's value provided that the function's symbol table entry is
+ appropriately annotated. Full details of function call linkage
+ requirements are provided in .Code Model OverviewExecutable modules can be built to use either position-dependent or
@@ -6312,9 +6261,9 @@ lvx v1, 0, r12
- pld r12, symbol@pcrel(0), 1
+ pld r12, symbol@pcrel
-plvx v1, symbol@pcrel(0), 1
+plxv v1, symbol@pcrelIn the OpenPOWER ELF V2 ABI, position-dependent code built with
this addressing scheme may have a Global Offset Table (GOT) in the data
segment that holds addresses. (For more information, see
@@ -6355,7 +6304,7 @@ plvx v1, symbol@pcrel(0), 1
references and TOC-pointer initializations can be performed using a
two-instruction sequence.
- PC-relative offsets are always 34 bits for all code models, with
+ PC-relative offsets are usually 34 bits for all code models, with
a maximum addressing reach of 16GB. The effective addressing reach
for global data is 8GB, since data sections are always located at
higher virtual addresses than text sections.
@@ -6425,9 +6374,9 @@ lvx v1, 0, r12
private data).
- pld r12, symbol@pcrel(0), 1
+ pld r12, symbol@pcrel
-plvx v1, symbol@pcrel(0), 1
+plxv v1, symbol@pcrelBy using PC-relative GOT-indirect
@@ -6435,10 +6384,10 @@ plvx v1, symbol@pcrel(0), 1
- pld r12, symbol@got@pcrel(0), 1
+ pld r12, symbol@got@pcrel
ld r12, 0(r12)
-pld r12, symbol@got@pcrel(0), 1
+pld r12, symbol@got@pcrel
lvx v1, 0, r12
A compiler may generate a PC-relative addressing sequence to access
@@ -6450,16 +6399,6 @@ lvx v1, 0, r12
the data reference is satisfied at static link time. See
.
- [To discuss: I'd like to see the assembler
- support "pld r12, symbol@pcrel" as an alternative to "pld r12,
- symbol@pcrel(0), 1", and "pld r12, symbol@got@pcrel" as an
- alternative to "pld r12, symbol@got@pcrel(0), 1". In general, any
- prefix load/store with only two arguments is PC-relative; the
- second argument is either a 34-bit offset or a GPR. Is this
- reasonable or too confusing? Another alternative would be "pld r12,
- symbol@pcrel(cia)" for an offset, and "pld r12, r5, cia" for the
- GPR case. I guess we want something readable that isn't too
- complex for the assembler to sort out.]Position-independent executables or shared objects have a GOT in
the data segment that holds addresses. When the system creates a memory
image from the file, the GOT entries are updated to reflect the
@@ -6477,11 +6416,11 @@ lvx v1, 0, r12
Code Models
- TOC-Based Compilation
- UnitsCompilers may provide different code models depending on the
expected size of the TOC and the size of the entire executable or
- shared library.
+ shared library. Assuming that the
+ TOC pointer is used to address data and/or text, the following
+ considerations apply:Small code model: The TOC is accessed using 16-bit offsets
@@ -6524,52 +6463,26 @@ lvx v1, 0, r12
TOCs, or by some other method. The suggested allocation order of
sections is provided in
.
- PC-Relative Compilation
- Units
- Compilers may provide different code models depending on the size of
- the entire executable or shared library. There is no small code
- model for PC-relative compilation units.
-
-
-
-
- Medium code model: Accesses to module-local code and data objects
- use PC-relative addressing with 34-bit offsets.
- Position-independent code uses PC-relative GOT-indirect
- addressing to access other objects in the binary.
-
-
-
-
- Large code model: Used when 34-bit offsets are insufficient to
- reach global data or the GOT from at least one text section,
- this is similar to the medium code model, except that up to
- 64-bit PC-relative offsets are used by generating them into a
- register. [To discuss: None of the options for this seem ideal.
- It takes about 5 instructions to generate a 64-bit constant into
- a register, though we can perhaps use linker optimizations to
- replace with a smaller sequence when available. A second choice
- is to place the offset in a .quad in the text section to reach
- the .got entry, but this would incur a load-load dependency.
- (Are there cases where this requires a text relocation resolution
- during dynamic linking?) A third choice is to fail the compile
- and require TOC addressing with large code model when 34-bit
- offsets aren't enough, though that doesn't initially seem
- reasonable. Whatever we choose, we should document the sequence
- and any associated linker optimizations.]
-
-
-
-
- As with TOC-based compilation units, the medium code model is the
- default for compilers, and is applicable to most programs and
- libraries. The code examples in this document generally use the
- medium code model.
+ PC-relative addressing may be used in either the small or the
+ medium code model, and is identical for both. Accesses to
+ module-local code and data objects use PC-relative addressing with
+ up to 34-bit offsets. Position-independent code uses PC-relative
+ GOT-indirect addressing to access other objects in the binary.
+ If PC-relative addressing span is insufficient to reach any data
+ item, that access must either be made relative to the TOC
+ pointer, or a PC-relative indexed form instruction must be used
+ for the access. PC-relative indexed form instructions provide
+ up to 64 bits of offset from the current instruction address.
+ [To discuss: I'm deliberately leaving this flexible for now.
+ Any concerns? It appears we will probably not see a
+ load-high-immediate-32 sort of instruction in P10, so we won't
+ be able to define those kinds of relocs yet.]
- When linking PC-relative relocatable objects, the linker should
- attempt to place the .got section near the text sections.
+ When linking objects that contain PC-relative relocations, the
+ linker should attempt to place the .got section near the text
+ sections.
@@ -6579,50 +6492,13 @@ lvx v1, 0, r12
section.
Function Prologue
- The function prologue is responsible for
- the following functions:
-
-
- Establishing addressability to global data
-
-
- Creating a stack frame when required
-
-
- Saving any nonvolatile registers that are used by the
- function
-
-
- Saving any limited-access bits that are used by the function,
- per the rules described in
-
-
- This ABI shall be used in conjunction with
- the Power Architecture that implements the
- mfocrf architecture level. Further,
- OpenPOWER-compliant processors shall implement implementation-defined
- bits in a manner to allow the combination of multiple
- mfocrf results with an OR instruction;
- for example, to yield a word in r0 including all three preserved CRs as
- follows:
- mfocrf r0, crf2
-mfocrf r1, crf3
-or r0, r0, r1
-mfocrf r1, crf4
-or r0, r0, r1
- Specifically, this allows each
- OpenPOWER-compliant processor implementation to set each field to hold
- either 0 or the correct in-order value of the corresponding CR field at
- the point where the mfocrf
- instruction is performed.
- TOC-Based Compilation
- Units
- In a TOC-based compilation unit,
- a function's prologue establishes addressability by
+ A function's prologue establishes addressability by
initializing a TOC pointer in register r2, if necessary, and a stack
frame, if necessary, and may save any nonvolatile registers it
- uses.
+ uses. Not all functions must initialize
+ a TOC pointer, and not all functions must preserve the existing value
+ of r2. See for more
+ information.All functions have a global entry point (GEP) available to any
caller and pointing to the beginning of the prologue. Some functions
may have a secondary entry point to optimize the cost of TOC pointer
@@ -6636,7 +6512,9 @@ or r0, r0, r1
entry point when the r2 register is known to hold a valid TOC base
value. Function pointers shared between modules shall always use the
global entry point to specify the address of a function.
- When a linker causes control to transfer to a global entry point,
+ When a linker causes control to transfer to a global entry point
+ of a function that requires a TOC
+ pointer,
it must insert a glue code sequence that loads r12 with the global
entry-point address. Code at the global entry point can assume that
register r12 points to the GEP.
@@ -6653,10 +6531,9 @@ addi r2, r2, .TOC.-func@l
form that is faster due to instruction fusion, such as:lis r2, .TOC.@ha
addi r2, r2, .TOC.@l
- In addition to establishing
- addressability, the function prologue
+ In addition to establishing addressability, the function prologue
is responsible for the following functions:
-
+ Creating a stack frame when required
@@ -6670,7 +6547,7 @@ addi r2, r2, .TOC.@l
- This ABI shall be used in conjunction with
+ This ABI shall be used in conjunction with
the Power Architecture that implements the
mfocrf architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined
@@ -6678,12 +6555,12 @@ addi r2, r2, .TOC.@l
mfocrf results with an OR instruction; for example,
to yield a word in r0 including all three preserved CRs as
follows:
- mfocrf r0, crf2
+ mfocrf r0, crf2
mfocrf r1, crf3
or r0, r0, r1
mfocrf r1, crf4
or r0, r0, r1
- Specifically, this allows each
+ Specifically, this allows each
OpenPOWER-compliant processor implementation to set each field to hold
either 0 or the correct in-order value of the corresponding CR field at
the point where the mfocrf
@@ -6707,14 +6584,6 @@ or r0, r0, r1
the meaning of the second parameter, which is put in the three
most-significant bits of the st_other field in the ELF Symbol Table
entry.
- PC-Relative Compilation
- Units
-
- In a PC-relative compilation unit, the function prologue does not
- require any setup code to establish addressability to global data.
- Therefore there is also no need for a function to have a separate
- local entry point.
- Function Epilogue
@@ -7438,12 +7307,12 @@ ptr = &dst;
.extern dst
.extern ptr
.section ".text"
-plwz r9, src@pcrel(0), 1
-pstw r9, dst@pcrel(0), 1
-paddi r11, 0, dst@pcrel, 1
-pstd r11, ptr@pcrel(0), 1
-pld r11, ptr@pcrel(0), 1
-plwz r9, src@pcrel(0), 1
+plwz r9, src@pcrel
+pstw r9, dst@pcrel
+paddi r11, dst@pcrel
+pstd r11, ptr@pcrel
+pld r11, ptr@pcrel
+plwz r9, src@pcrel
stw r9, 0(r11)
@@ -7467,8 +7336,8 @@ stw r9, 0(r11)
a signed 32-bit offset from a base register.
- For a PIC code (see
- and
+ For TOC-based PIC
+ code (see and
), the offset in the
Global Offset Table where the value of the symbol is stored is
given by the assembly syntax symbol@got. This syntax represents the
@@ -7611,8 +7480,8 @@ nop
- For a function call in a PC-relative compilation unit, the nop in
- should not be generated.
+ For a function call in a function that does not preserve r2, the nop in
+ need not be generated.
For indirect function calls, the address of the function to be
called is placed in r12 and the CTR register. A bctrl instruction is used
@@ -7688,9 +7557,6 @@ bctrl
shows how to make an indirect
function call using small-model position-independent code.
- Note that the store and reload of the
- TOC pointer r2 is not required in a PC-relative compilation
- unit.Branching
@@ -8277,12 +8143,7 @@ f1:
shows a switch
- implementation for PC-relative compilation units. [TBD: This needs to
- be a figure, not a table, which may require working with Annette and
- FrameMaker to get something that looks similar to the other figures.
- All we have in the document for the other figures is .png files from
- the old FrameMaker version. Or maybe we should just convert all the
- other figures to tables.]
+ implementation for PC-relative compilation units. [TBD: Formatting]
@@ -8328,7 +8189,7 @@ default:
cmplwi r12, 4
bge .Ldefault
slwi r12, 2
- paddi r10, r0, .Ltab@pcrel, 1
+ paddi r10, .Ltab@pcrel
lwax r8, r10, r12
add r10, r8, r10
mtctr r10
@@ -8416,11 +8277,6 @@ addi r3,r1,p ; R3 = new data area following parameter save area.
-
- It is unnecessary to copy the TOC pointer doubleword for a
- PC-relative compilation unit. [To discuss: Should we, for future
- use of this slot for another purpose?]
- Additional instructions will be necessary for an allocation of
variable size. If a dynamic deallocation will occur, the r1 stack
@@ -8794,6 +8650,10 @@ addi r3,r1,p ; R3 = new data area following parameter save area..
+
+ [Ignorant question to discuss: Are there any impacts to unwinding from
+ new r2 preservation rules?]
+
diff --git a/specification/ch_3.xml b/specification/ch_3.xml
index c5ee16d..4d04a29 100644
--- a/specification/ch_3.xml
+++ b/specification/ch_3.xml
@@ -245,9 +245,7 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.
TOC
- The TOC is part of the data segment of an executable program
- built from at least one TOC-based object
- file.
+ The TOC is part of the data segment of an executable program.This section describes a common layout of the TOC in an executable
file or shared object. Particular tools are not required to follow the
layout specified here.
@@ -272,6 +270,11 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.
+
+ [To discuss: Alan, is it appropriate to make any adjustments here
+ in the presence of PC-relative addressing, to get any sections closer
+ to .text, or are we as ideal as we can get already?]
+ The medium code model is expected to provide a sufficiently large TOC
to provide all data addressing needs of a module with a single TOC.Compilers may generate two-instruction medium code model references
@@ -306,9 +309,7 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.
- For TOC-based compilation
- units,the OpenPOWER
- ABI uses the three most-significant bits in the
+ The OpenPOWER ABI uses the three most-significant bits in the
symbol st_other field to specify the number of instructions between a
function's global entry point and local entry point. The global entry
point is used when it is necessary to set up the TOC pointer (r2) for the
@@ -490,6 +491,427 @@ my_func:
optimize the prologue sequence. Nor does the absence of this relocation
forbid the linker from optimizing the prologue sequence.
+
+ Function Call Linkage Protocols
+
+ The compiler (or assembly programmer) and linker cooperate to make
+ function calls as efficient as possible. Different protocols are
+ required depending on whether a call is local (caller and callee in
+ the same compilation unit), whether the caller requires r2 to be
+ preserved, and whether the callee promises to preserve r2. The
+ "st_other bits" in the caller's and callee's symbol table entries,
+ described in , are used to
+ determine information about r2 preservation requirements.
+
+
+ A function that does not require a TOC pointer may have its
+ st_other bits set to 0 or 1, and its local and global entry points
+ are the same. If its st_other bits are 0, it preserves r2; if
+ its st_other bits are 1, it does not promise to do so. It is best
+ that a function with st_other bits set to 0 does not contain any
+ function calls; see the Note for st_other 0 in
+ .
+
+
+ summarizes the
+ protocol requirements for external function calls, and
+ summarizes the
+ protocol requirements for local function calls. Each entry in these
+ tables is further described in the referenced section.
+
+
+
+ External Call, Preserving Caller
+
+ When a function that preserves r2 makes any call to an external
+ function, the compiler generates a nop instruction after the bl
+ instruction for the call. The linker generates a procedure linkage
+ table (PLT) stub that saves r2, and replaces the nop instruction with
+ a restore of r2. If the callee requires a TOC, the PLT stub also
+ includes code to place the callee's global entry point into r12.
+ See for a full
+ description of PLT stubs.
+
+
+
+ External Call, Nonpreserving Caller
+
+ When a function that does not preserve r2 makes any call to an
+ external function, the compiler does not generate a nop instruction
+ after the bl instruction for the call. Instead, the compiler
+ annotates the bl instruction with an R_PPC64_REL24_NOTOC
+ relocation. The linker generates a PLT stub that does not include
+ a save of r2. If the callee requires a TOC, the PLT stub also
+ includes code to place the callee's global entry point into r12.
+
+
+
+ Local Call, Nonpreserving Caller, Callee Needs No TOC
+
+ When a function that does not preserve r2 makes a local call to
+ a function that does not require a TOC pointer, the compiler
+ generates a direct call to the function's local entry point, and
+ does not generate a nop instruction after the call. The compiler
+ annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation.
+
+
+
+ Local Call, Nonpreserving Caller, Callee Requires TOC
+
+ When a function that does not preserve r2 makes a local call to
+ a function that requires a TOC pointer, the compiler does not
+ generate a nop instruction after the bl instruction for the call.
+ The linker generates a PLT stub that does not include a save of r2,
+ but does include code to place the callee's global entry point into
+ r12.
+
+
+
+ Local Call, Preserving Caller, Preserving Callee
+
+ When a function that preserves r2 makes a local call to a function
+ that also preserves r2, the compiler generates a direct call to the
+ function's local entry point, and does not generate a nop
+ instruction after the call. The compiler annotates the bl
+ instruction with an R_PPC64_REL24_NOTOC relocation.
+
+
+
+ Local Call, Preserving Caller, Nonpreserving Callee
+
+ When a function that preserves r2 makes a local call to a function
+ that does not preserve r2, the compiler generates a nop instruction
+ after the call. The linker generates a PLT stub that saves r2, but
+ does not include code to place the callee's global entry point into
+ r12, and replaces the nop instruction with a restore of r2.
+
+
+ Use of the Small Data AreaFor a data item in the .sdata or .sbss sections, a compiler may
@@ -2069,71 +2491,325 @@ my_func:
-
- 0
-
+
+ 0
+
+
+ 1
+
+
+ 2
+
+
+ 3
+
+
+ 4
+
+
+ 5
+
+
+ 6
+
+
+ 7
+
+
+ 8
+
+
+ 9
+
+
+ 10
+
+
+ 11
+
+
+ 12
+
+
+ 13
+
+
+ 14
+
+
+ 15
+
+
+
+
+
+
+ In the following figure, prefix34 specifies a 34-bit field split
+ between bits 14-31 and 48-63 of a doubleword. The other bits
+ remain unchanged. This is used by many PC-relative load and store
+ instructions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ prefix34
+
+
+
+
+ 0
+
+
+ 13
+
+
+ 14
+
+
+
+
+
+ 31
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ prefix34 (continued)
+
+
+
+
+ 32
+
+
+
+
+
+ 47
+
+
+ 48
+
+
+ 63
+
+
+
+
+
+
+ In the following figure, prefix34ds is similar to prefix34, but is
+ really just 32 bits because the two least-significant bits must be
+ zero and are not really part of the field. This is used, for example,
+ by the pld instruction.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ prefix34ds
+
+
+
+
+ 0
+
+
+ 13
+
+
+ 14
+
+
+
+
+
+
+
+
+
+
+
+ 31
+
+
+
- 1
+
- 2
+
-
- 3
+
+
- 4
+
-
- 5
+
+
- 6
+
- 7
+
-
- 8
+
+
+
+
+
+
+
+
+
+
+
+
+ prefix34ds (continued)
-
- 9
+
+
+
+
+
+
+
+
+
+ 32
- 10
+
-
- 11
+
+ 47
-
- 12
+
+ 48
-
- 13
+
+ 61
-
- 14
+
+ 62
- 15
+ 63
- In the following figure, prefix34 specifies a 34-bit field split
- between bits 14-31 and 48-63 of a doubleword. The other bits
- remain unchanged. This is used by PC-relative load and store
- instructions.
+ In the following figure, prefix34dq is similar to prefix34, but is
+ really just 31 bits because the three least-significant bits must be
+ zero and are not really part of the field. This is used, for example,
+ by the plxv instruction.
-
+
-
-
+
+
+
+
@@ -2151,6 +2827,12 @@ my_func:
+
+
+
+
+
+
@@ -2159,8 +2841,8 @@ my_func:
-
- prefix34
+
+ prefix34dq
@@ -2176,6 +2858,12 @@ my_func:
+
+
+
+
+
+ 31
@@ -2193,6 +2881,12 @@ my_func:
+
+
+
+
+
+
@@ -2207,9 +2901,15 @@ my_func:
-
- prefix34 (continued)
+
+ prefix34dq (continued)
+
+
+
+
+
+
@@ -2224,6 +2924,12 @@ my_func:
48
+
+ 60
+
+
+ 61
+ 63
@@ -2232,37 +2938,29 @@ my_func:
- In the following figure, prefix34ds is similar to prefix34, but is
- really just 32 bits because the two least-significant bits must be
- zero and are not really part of the field. This is used, for example,
- by the pldu instruction. In addition to the use of this relocation
- field with the DS forms, prefix34ds relocations are also used in
- conjunction with DQ forms, such as the plq instruction. In those
- instances, the linker and assembler collaborate to create valid DQ
- forms. They raise an error if the specified offset does not meet the
- constraints of a valid DQ instruction form displacement.
+ In the following figure, prefix28dq specifies a 25-bit field split
+ between bits 20-31 and 48-60 of a doubleword. The other bits
+ remain unchanged, and the 25-bit field is assumed to be concatenated
+ with three zero bits on the right to form a 28-bit offset. This is
+ used, for example, by the pmlxv instruction.
-
-
-
-
-
-
+
+
+
+
+
+
-
-
-
-
-
+
@@ -2279,28 +2977,28 @@ my_func:
+
+
+
-
- prefix34ds
+
+ prefix28dq0
-
- 13
-
-
- 14
-
-
-
+
+ 19
+
+
+ 20
@@ -2310,9 +3008,6 @@ my_func:
-
-
-
@@ -2333,17 +3028,14 @@ my_func:
-
-
-
-
- prefix34ds (continued)
+
+ prefix28dq (continued)
@@ -2356,9 +3048,6 @@ my_func:
32
-
-
- 47
@@ -2366,10 +3055,10 @@ my_func:
48
- 61
+ 60
- 62
+ 6163
@@ -2382,12 +3071,6 @@ my_func:
Relocation NotationsThe following notations are used in the relocation table.
-
- [There seem to be a number of missing notations in this table. We
- have #higher[a], #highest[a], and got, and perhaps the @ notation
- could use further description. Also, there is some usage of #high and
- #higha instead of #hi and #ha, which I assume is a mistake.]
-
@@ -2525,7 +3208,8 @@ my_func:
#hi(value)
- Denotes bits 16–63 of the indicated value. That
+ Denotes bits 16–31 of the indicated value. That
is:#hi(x) = x >> 16
@@ -2535,12 +3219,57 @@ my_func:
#ha(value)
- Denotes the high adjusted value: bits 16–63 of the
+ Denotes the high adjusted value: bits
+ 16–31 of the
indicated value, compensating for #lo( ) being treated as a
signed number. That is:#ha(x) = (x + 0x8000) >> 16
+
+
+ #higher(value)
+
+
+ Denotes bits 32–47 of the indicated value. That
+ is:
+ #higher(x) = x >> 32
+
+
+
+
+ #highera(value)
+
+
+ Denotes the higher adjusted value: bits 32–47
+ of the
+ indicated value, compensating for #hi( ) being treated as a
+ signed number. That is:
+ #highera(x) = (x + 0x80000000) >> 32
+
+
+
+
+ #highest(value)
+
+
+ Denotes bits 48–63 of the indicated value. That
+ is:
+ #higher(x) = x >> 48
+
+
+
+
+ #highesta(value)
+
+
+ Denotes the highest adjusted value: bits 48–63
+ of the
+ indicated value, compensating for #higher( ) being treated as a
+ signed number. That is:
+ #highesta(x) = (x + 0x800000000000) >> 48
+
+ TP
@@ -4206,7 +4935,7 @@ my_func:
half16
- #high(S + A)
+ #hi(S + A)
@@ -4220,7 +4949,7 @@ my_func:
half16
- #higha(S + A)
+ #ha(S + A)
@@ -4234,7 +4963,9 @@ my_func:
half16
- #high(@tprel)
+
+ #hi(@tprel)
+
@@ -4248,7 +4979,9 @@ my_func:
half16
- #higha(@tprel)
+
+ #ha(@tprel)
+
@@ -4262,7 +4995,9 @@ my_func:
half16
- #high(@dtprel)
+
+ #hi(@dtprel)
+
@@ -4276,7 +5011,9 @@ my_func:
half16
- #higha(@dtprel)
+
+ #ha(@dtprel)
+
@@ -4426,10 +5163,10 @@ my_func:
R_PPC64_PCREL34
- 256?
+ 256
- prefix34
+ prefix34*@pcrel
@@ -4440,7 +5177,7 @@ my_func:
R_PPC64_PCREL34_DS
- 257?
+ 257prefix34ds*
@@ -4449,15 +5186,43 @@ my_func:
@pcrel >> 2
+
+
+ R_PPC64_PCREL34_DQ
+
+
+ 258
+
+
+ prefix34dq*
+
+
+ @pcrel >> 3
+
+
+
+
+ R_PPC64_PCREL28_DQ
+
+
+ 259
+
+
+ prefix28dq*
+
+
+ @pcrel >> 3
+
+ R_PPC64_GOT_PCREL34
- 258?
+ 260
- prefix34
+ prefix34*@got@pcrel
@@ -4468,7 +5233,7 @@ my_func:
R_PPC64_GOT_PCREL34_DS
- 259?
+ 261prefix34ds*
@@ -4477,12 +5242,40 @@ my_func:
@got@pcrel >> 2
+
+
+ R_PPC64_GOT_PCREL34_DQ
+
+
+ 262
+
+
+ prefix34dq*
+
+
+ @got@pcrel >> 3
+
+
+
+
+ R_PPC64_GOT_PCREL28_DQ
+
+
+ 263
+
+
+ prefix28dq*
+
+
+ @got@pcrel >> 3
+
+ R_PPC64_PCREL_OPT
- 260?
+ 264
@@ -4494,11 +5287,6 @@ my_func:
-
- [To discuss: Assuming we build up 64-bit PC-relative offsets into a
- register using shifts/adds, we'll need the #lo, #ha, #higher[a],
- #highest[a] relocs to be defined also.]
- Relocation Descriptions
@@ -4583,10 +5371,16 @@ my_func:
R_PPC64_REL24_NOTOCThis relocation type is used to specify a function call where the
TOC pointer is not initialized. It is similar to R_PPC64_REL24 in that it
- specifies a symbol to be resolved. However, if the symbol is resolved by
- inserting a call to a PLT stub code, the PLT stub code must not rely on
- the presence of a valid TOC base address in TOC register r2 to reference
- the PLT function table.
+ specifies a symbol to be resolved. If the
+ symbol resolves to a function that requires a TOC pointer (as
+ determined by st_other bits) then a link editor must arrange for the
+ call to be via the entry point of the called function. Any
+ However, if the symbol is resolved by
+ inserting a call to a PLT stub code, the PLT stub code must
+ not rely on the presence of
+ a valid TOC base address in TOC
+ register r2 to reference
+ the PLT function table.
R_PPC64_ENTRYThis relocation type may optionally be
associated with a global entry point. See
@@ -4595,7 +5389,7 @@ my_func:
R_PPC64_PCREL_OPT
This relocation type requests that the annotated load or store
- instruction and its immediately preceding instruction be optimized by
+ instruction and its immediately following instruction be optimized by
the linker when the referenced symbol can be statically resolved.
See for details.
@@ -4661,15 +5455,16 @@ addi 2,2,.TOC.-func@l
requirements as indicated in this section.Function Call
- For TOC-based compilation
- units,the
+ When present,
+ the
static linker must modify a nop instruction after a bl function
call to restore the TOC pointer in r2 from 24(r1) when an external symbol
that may use the TOC may be called, as in
.
- TOC-based
- object files must contain a
- nop slot after a bl instruction to an external symbol.
+ A function must contain a
+ nop slot after a bl instruction to an external symbol
+ unless the bl instruction is annotated with
+ an R_PPC64_REL24_NOTOC relocation.Reference Optimization
@@ -4750,33 +5545,25 @@ target:
and replace the reference with direct PC-relative addressing.
For example:
- pld r12, symbol@got@pcrel(0), 1
+ pld r12, symbol@got@pcrel
lvx v1, 0, r12The previous sequence may be replaced by:
- nop
-plvx v1, symbol@pcrel(0), 1
+ plxv v1, symbol@pcrel
+nop
However, this optimization is not universally safe, since it
changes the value of r12 following the data reference. The
compiler or programmer must ensure that the value of r12 is not
subsequently used, and communicate a request for this optimization
- by placing a RELOC_PPC64_PCREL_OPT on the second instruction in
- the sequence. The compiler or programmer must further ensure that
+ by placing an R_PPC64_PCREL_OPT relocation on the first instruction
+ in the sequence. The compiler or programmer must further ensure that
the two instructions are not separated by intervening instructions.
- [To discuss: This optimization is crucial for making PC-relative
- performance good enough to replace TOC-relative addressing. I
- thought about allowing the compiler to separate the two instructions,
- and place an instruction-distance value in the
- RELOC_PPC64_PCREL_OPT relocation field, but ultimately I think this
- becomes difficult to implement, and I hope that the load-from-DSO
- case is infrequent enough that the load-load dependency won't kill
- us. Definitely need other opinions/ideas here.]
-
-
- [To discuss: Can we add optimizations for PC-relative offsets built
- for large code model? Only applies if we use shift/add sequences.]
+ [To discuss: A possible alternative, due to Alan, is to allow the
+ code to separate but emit "pld".."lvx;nop" and optimize to
+ "dnop".."plxv". In this case the PCREL_OPT should be placed on
+ both groups of insns. Should we pursue?]
diff --git a/specification/ch_4.xml b/specification/ch_4.xml
index 98b15e5..3b5a9a6 100644
--- a/specification/ch_4.xml
+++ b/specification/ch_4.xml
@@ -698,20 +698,25 @@ PPC_FEATURE_HAS_VSX 0x00000080 /* P7 Vector Extension. */
PPC_FEATURE_PSERIES_PERFMON_COMPAT 0x00000040
PPC_FEATURE_TRUE_LE 0x00000002
PPC_FEATURE_PPC_LE 0x00000001
+ Bit 0x00000004 is reserved for kernel use.
+ AT_HWCAP2The a_val member of this entry is a bit map of hardware
capabilities. Some bit mask values include:
- PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */
-PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */
-PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */
-PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */
-PPC_FEATURE2_HAS_ISEL 0x08000000 /* Integer Select */
-PPC_FEATURE2_HAS_TAR 0x04000000 /* Target Address Register */
-PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the
- Vector.AES category */
-PPC_FEATURE2_HTM_NOSC 0x01000000
-PPC_FEATURE2_ARCH_3_00 0x00800000 /* ISA 3.0 */
-PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
+ PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */
+PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */
+PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */
+PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */
+PPC_FEATURE2_HAS_ISEL 0x08000000 /* Integer Select */
+PPC_FEATURE2_HAS_TAR 0x04000000 /* Target Address Register */
+PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the
+ Vector.AES category */
+PPC_FEATURE2_HTM_NOSC 0x01000000
+PPC_FEATURE2_ARCH_3_00 0x00800000 /* ISA 3.0 */
+PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
+PPC_FEATURE2_DARN 0x00200000 /* darn instruction */
+PPC_FEATURE2_SCV 0x00100000 /* scv syscall */
+PPC_FEATURE2_HTM_NO_SUSPEND 0x00080000 /* TM without suspended state */When a process starts to execute, its stack holds the arguments,
environment, and auxiliary vector received from the exec call. The system
makes no guarantees about the relative arrangement of argument strings,
@@ -797,10 +802,6 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
- The 8-byte header value is undefined when all linked compilation units
- are PC-relative.
-
The link editor shall emit dynamic relocations as appropriate for each
entry in the GOT. At runtime, the dynamic linker will apply these
relocations after the addresses of all memory segments are known (and
@@ -816,10 +817,7 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
- When at least one TOC-based
- compilation unit is to be linked,
- the
- symbol .TOC. may be used to access the GOT or in TOC-relative
+ The symbol .TOC. may be used to access the GOT or in TOC-relative
addressing to other data constructs, such as the procedure linkage table.
The symbol may be offset by 0x8000 bytes, or another offset, from the
start of the .got section. This offset allows the use of the full (64 KB)
@@ -830,15 +828,15 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
- In PIC code, the TOC pointer r2 points to the TOC base, enabling
+ In PIC code that uses the
+ TOC, the TOC pointer r2 points to the TOC base, enabling
easy reference. For static nonrelocatable modules, the GOT address is
fixed and can be directly used by code.
- All functions in TOC-based
- compilation units except leaf routines must load the value of
- the TOC base into the TOC register r2.
+ All functions except leaf routines must
+ load the value of the TOC base into the TOC register r2.
- Functions in PC-relative compilation units access GOT entries directly
- using PC-relative addressing.
+ Code may access GOT entries directly using PC-relative addressing,
+ where available.
@@ -998,13 +996,6 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
-
-
- The caller is PC-relative and does not need to save the TOC
- pointer. [To discuss: Do we need a relocation, or will we have
- a module-level bit the linker can detect?]
-
- In any scenario, the PLT call stub must transfer control to the
function whose address is provided in the associated PLT entry. This
@@ -1053,14 +1044,12 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
- A possible implementation for case 4 looks as follows:
+ When PC-relative addressing is available, another simpler variant
+ may alternatively be used for cases 2 or 3:
- pld r12, func@plt@got@pcrel(0), 1
+ pld r12, func@plt@pcrel
mtctr r12
bctr
-
- [To discuss: Is that the right assembly syntax?]
- To support lazy binding, the link editor also provides a set of
symbol resolver stubs, one for each PLT entry. Each resolver stub
consists of a single instruction, which is usually a branch to a common
@@ -1133,10 +1122,7 @@ bctrAfter resolution, the value of a PLT entry in the PLT is the
address of the function’s global entry point, unless the resolver
can determine that a module-local call occurs with a shared TOC value
- wherein the TOC is shared between the caller and the
- callee,
- or a module-local call occurs in a
- PC-relative compilation unit. [?]
+ wherein the TOC is shared between the caller and the callee.