Second draft of PC-relative addressing changes.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>
master
Bill Schmidt 7 years ago
parent 16ef9435f5
commit 508ef6ce66

@ -57,7 +57,7 @@
<holder>Freescale Semiconductor, Inc</holder> <holder>Freescale Semiconductor, Inc</holder>
</copyright> </copyright>
<!-- TODO: Set the correct document releaseinfo --> <!-- TODO: Set the correct document releaseinfo -->
<releaseinfo>Revision 1.5 draft</releaseinfo> <releaseinfo>Revision 1.5b draft</releaseinfo>
<productname>OpenPOWER</productname> <productname>OpenPOWER</productname>
<pubdate/> <pubdate/>


@ -93,6 +93,17 @@


<revhistory> <revhistory>
<!-- TODO: Set the initial version information and clear any old information out --> <!-- TODO: Set the initial version information and clear any old information out -->
<revision>
<date>2018-04-13</date>
<revdescription>
<itemizedlist spacing="compact">
<listitem>
<para>Revision 1.5b: PC-relative addressing second
draft.</para>
</listitem>
</itemizedlist>
</revdescription>
</revision>
<revision> <revision>
<date>2018-03-14</date> <date>2018-03-14</date>
<revdescription> <revdescription>

@ -179,4 +179,8 @@
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</section> </section>
<section revisionflag="added">
<title>Changes from release 1.4</title>
<para>TBD</para>
</section>
</chapter> </chapter>

@ -4045,70 +4045,6 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</section> </section>
</section> </section>
</section> </section>
<section revisionflag="added" xml:id="dbdoclet.50655240_AddrModel">
<title revisionflag="added">Global Data Addressing Models</title>
<para revisionflag="added">This specification provides for two global data
addressing models. The traditional addressing model, which we will call
"TOC-based," relies on a dedicated table-of-contents (TOC) pointer to
obtain the addresses of global data. PowerISA version 3.1 introduces new
"PC-relative" instructions that can be used to obtain the addresses of
global data relative to the current instruction address (CIA). Code that
is targeted to run on hardware compliant with PowerISA 3.1 may make use of
this capability with a "PC-relative" addressing model.</para>
<para revisionflag="added">Each compilation unit must adhere entirely to
one addressing model or the other. However, it is expressly possible to
link TOC-based and PC-relative compilation units into a single
executable, or to dynamically link from a compilation unit with one
addressing model to a compilation unit with the other addressing model.
In particular, a PC-relative compilation unit may be linked with an
existing TOC-based library. Note that a "compilation unit" may consist of
hand-written assembly code as well as high-level source code.</para>
<para revisionflag="added">Compilers and other tools performing
link-time optimizations that repackage functions into different
compilation units must not mix PC-relative and TOC-based functions in
the same compilation unit. [To discuss: This could be permitted, but
the value is unclear and it would be likely to spawn occasional
linker bugs.] Similarly, programmers should not be allowed to
specify a single function in a TOC-based compilation unit to use the
PC-relative addressing model or vice versa; for example, using GCC's
"#pragma target" syntax. [To discuss: How should this be recorded and
communicated? Perhaps add to e_flags in the ELF header for module
objects only? We can communicate the need for PC-relative PLT stubs
to the linker on calls with a reloc, so the linker may not need this,
but perhaps other tools will?]</para>
<para revisionflag="added">Details of the two addressing models will be
provided throughout this specification. However, a brief description
of each is in order.</para>
<section revisionflag="added" xml:id="dbdoclet.50655240_TOCBased">
<title revisionflag="added">TOC-Based Addressing Model</title>
<para revisionflag="added">In the traditional TOC-based addressing model,
each function uses register r2 (see <xref
linkend="dbdoclet.50655240_68174" />) to access global memory. A variety
of techniques, known as TOC-relative, TOC-indirect, GOT-relative, etc.,
may be used to address the global data, but all these techniques use the
TOC pointer r2 as part of the data reference.</para>
<para revisionflag="added">With the cooperation of the linker, each
function in a TOC-based compilation unit is responsible for the
establishment and maintenance of its own TOC pointer. All functions
within a compilation unit have the same TOC pointer, so local function
calls may assume it does not change. An external function call may be
resolved to a function in a shared object having a different TOC
pointer, so a caller in a TOC-based compilation unit must save its TOC
pointer prior to making a call outside the compilation unit, and restore
its value upon return before the TOC pointer may be used to access global
data.</para>
</section>
<section revisionflag="added" xml:id="dbdoclet.50655240_PCRel">
<title revisionflag="added">PC-Relative Addressing Model</title>
<para revisionflag="added">A function in a PC-relative compilation unit
has no TOC pointer. All accesses to global data are made relative to
the current instruction address. Since functions in TOC-based
compilation units are responsible for establishment and maintenance
of their own TOC pointers, register r2 may be used freely within a
PC-relative compilation unit, with no need to save or restore the
register when modifying it.</para>
</section>
</section>
<section xml:id="dbdoclet.50655240_85672"> <section xml:id="dbdoclet.50655240_85672">
<title>Function Calling Sequence</title> <title>Function Calling Sequence</title>
<para>The standard sequence for function calls is outlined in this section. <para>The standard sequence for function calls is outlined in this section.
@ -4273,22 +4209,25 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</entry> </entry>
<entry> <entry>
<para>Nonvolatile<footnote> <para>Nonvolatile<footnote>
<para><phrase revisionflag="changed">In a TOC-based <para>Register r2 is nonvolatile with respect to calls
compilation unit, register</phrase> r2 is nonvolatile with between functions in the same compilation unit <phrase
respect to calls between functions in the same compilation revisionflag="added">when the caller requires a TOC
unit. It is saved and restored by code inserted by the linker pointer</phrase>. It is saved and restored by code inserted
resolving a call to an external function. For more by the linker resolving a call to an external function. For
information, see <xref linkend="dbdoclet.50655240_51083" more information, see <xref linkend="dbdoclet.50655240_51083"
/>.</para> /> <phrase revisionflag="added"> and <xref
linkend="dbdoclet.50655241_FnLinkage" /></phrase>.</para>
</footnote><phrase revisionflag="added"> or </footnote><phrase revisionflag="added"> or
Volatile<footnote> Volatile<footnote>
<para>Register r2 is volatile and available for use in <para>Register r2 is volatile and available for use in a
PC-relative compilation units.</para> function whose symbol table entry contains an st_other
field wherein the three most-significant bits have a value
of 001. See
<xref linkend="dbdoclet.50655241_FnLinkage" />.</para>
</footnote></phrase></para> </footnote></phrase></para>
</entry> </entry>
<entry> <entry>
<para>TOC pointer <phrase revisionflag="added"> for <para>TOC pointer.</para>
TOC-based compilation units</phrase>.</para>
</entry> </entry>
</row> </row>
<row> <row>
@ -4460,8 +4399,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</table> </table>
<para>&#160;</para> <para>&#160;</para>
<bridgehead xml:id="dbdoclet.50655240_51083">TOC Pointer <bridgehead xml:id="dbdoclet.50655240_51083">TOC Pointer
Usage <phrase revisionflag="added">(TOC-Based Compilation Units Usage</bridgehead>
Only)</phrase></bridgehead>
<para>As described in <para>As described in
<xref linkend="dbdoclet.50655241_73385" />, the TOC pointer, r2, is <xref linkend="dbdoclet.50655241_73385" />, the TOC pointer, r2, is
commonly initialized by the global function entry point when a function commonly initialized by the global function entry point when a function
@ -4476,14 +4414,19 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
dynamic linker. For references through function pointers, it is the dynamic linker. For references through function pointers, it is the
compiler's or assembler programmer's responsibility to insert compiler's or assembler programmer's responsibility to insert
appropriate TOC save and restore code. If the function is called from appropriate TOC save and restore code. If the function is called from
the same module as the callee, the callee must preserve the value of the same module as the callee, the callee must <phrase
r2. (See revisionflag="added">normally</phrase> preserve the value of r2.
<xref linkend="dbdoclet.50655241_69294" /> for a description of function <phrase revisionflag="added">However, if the callee's symbol table
entry conventions.)</para> entry is flagged to indicate the callee does not preserve r2, the
<para>When a function calls another function, the TOC pointer must have caller is responsible for saving and restoring the TOC pointer if it
a legal value pointing to the TOC base, which may be initialized as needs it.</phrase> (See <phrase revisionflag="changed"><xref
described in linkend="dbdoclet.50655241_FnLinkage" />
<xref linkend="dbdoclet.50655242_47739" />.</para> for more information.</phrase>)</para>
<para>When a function calls another function <phrase
revisionflag="added">that requires a TOC pointer</phrase>, the TOC
pointer must have a legal value pointing to the TOC base, which may be
initialized as described in <xref
linkend="dbdoclet.50655242_47739" />.</para>
<para>When global data is accessed, the TOC pointer must be available <para>When global data is accessed, the TOC pointer must be available
for dereference at the point of all uses of values derived from the TOC for dereference at the point of all uses of values derived from the TOC
pointer in conjunction with the @l operator. This property is used by pointer in conjunction with the @l operator. This property is used by
@ -4513,12 +4456,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
context.</para> context.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>When a function is entered through its global entry point, <para>When a function <phrase revisionflag="added">that requires a
TOC pointer</phrase> is entered through its global entry point,
register r12 contains the entry-point address. For more register r12 contains the entry-point address. For more
information, see the description of dual entry points in information, see the description of dual entry points in
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" /> and <xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" />
and <xref linkend="dbdoclet.50655240_13754" />.</para>
<xref linkend="dbdoclet.50655240_13754" />.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>&#160;</para> <para>&#160;</para>
@ -5200,16 +5143,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
is volatile over a function call.</para> is volatile over a function call.</para>
<para>&#160;</para> <para>&#160;</para>
<bridgehead>TOC Pointer Doubleword</bridgehead> <bridgehead>TOC Pointer Doubleword</bridgehead>
<para>If a function <phrase revisionflag="added">in a TOC-based <para>If a function changes the value of the TOC pointer register,
compilation unit</phrase> changes the value of the TOC pointer it shall first save it in the TOC pointer doubleword.</para>
register, it shall first save it in the TOC pointer doubleword.
<phrase revisionflag="added">The TOC pointer doubleword is reserved
for future use for functions in a PC-relative compilation
unit. [To discuss: This has implications for alloca, as if we
reserve it for future use, then the TOC pointer doubleword must be
copied during a dynamic allocation operation. I suspect it is
better to suffer that slight penalty rarely in order to have the
flexibility to use this for another future purpose.]</phrase></para>
</section> </section>
<section xml:id="dbdoclet.50655240_15141"> <section xml:id="dbdoclet.50655240_15141">
<title>Optional Save Areas</title> <title>Optional Save Areas</title>
@ -6250,6 +6185,20 @@ s6 - 72 (stored)</programlisting>
<para>When instructions hold relative addresses, a program library can be <para>When instructions hold relative addresses, a program library can be
loaded at various positions in virtual memory and is referred to as a loaded at various positions in virtual memory and is referred to as a
position-independent code model.</para> position-independent code model.</para>
<para revisionflag="added">When generating code for PowerISA version 3.1
or above, this specification provides two ways to address non-local data
and text. The historical method relies on a dedicated table-of-contents
(TOC) pointer to obtain such addresses. PowerISA version 3.1 introduces
new "PC-relative" instructions that can be used to obtain such
addresses relative to the current instruction address (CIA). Both
methods may be used in the same executable, dynamically shared
object (DSO), object file, or even in the same function. If a
function does not require a TOC pointer for addressing, it is not required
to establish this pointer in register r2, and may choose not to preserve
register r2's value provided that the function's symbol table entry is
appropriately annotated. Full details of function call linkage
requirements are provided in <xref linkend="dbdoclet.50655241_FnLinkage"
/>.</para>
<section xml:id="dbdoclet.50655240___RefHeading___Toc377640592"> <section xml:id="dbdoclet.50655240___RefHeading___Toc377640592">
<title>Code Model Overview</title> <title>Code Model Overview</title>
<para>Executable modules can be built to use either position-dependent or <para>Executable modules can be built to use either position-dependent or
@ -6312,9 +6261,9 @@ lvx v1, 0, r12</programlisting>
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@pcrel(0), 1 <programlisting revisionflag="added">pld r12, symbol@pcrel


plvx v1, symbol@pcrel(0), 1</programlisting> plxv v1, symbol@pcrel</programlisting>
<para>In the OpenPOWER ELF V2 ABI, position-dependent code built with <para>In the OpenPOWER ELF V2 ABI, position-dependent code built with
this addressing scheme may have a Global Offset Table (GOT) in the data this addressing scheme may have a Global Offset Table (GOT) in the data
segment that holds addresses. (For more information, see segment that holds addresses. (For more information, see
@ -6355,7 +6304,7 @@ plvx v1, symbol@pcrel(0), 1</programlisting>
references and TOC-pointer initializations can be performed using a references and TOC-pointer initializations can be performed using a
two-instruction sequence.</para> two-instruction sequence.</para>
<para revisionflag="added"> <para revisionflag="added">
PC-relative offsets are always 34 bits for all code models, with PC-relative offsets are usually 34 bits for all code models, with
a maximum addressing reach of 16GB. The effective addressing reach a maximum addressing reach of 16GB. The effective addressing reach
for global data is 8GB, since data sections are always located at for global data is 8GB, since data sections are always located at
higher virtual addresses than text sections. higher virtual addresses than text sections.
@ -6425,9 +6374,9 @@ lvx v1, 0, r12</programlisting>
private data).</para> private data).</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@pcrel(0), 1 <programlisting revisionflag="added">pld r12, symbol@pcrel


plvx v1, symbol@pcrel(0), 1</programlisting> plxv v1, symbol@pcrel</programlisting>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para revisionflag="added">By using PC-relative GOT-indirect <para revisionflag="added">By using PC-relative GOT-indirect
@ -6435,10 +6384,10 @@ plvx v1, symbol@pcrel(0), 1</programlisting>
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@got@pcrel(0), 1 <programlisting revisionflag="added">pld r12, symbol@got@pcrel
ld r12, 0(r12) ld r12, 0(r12)


pld r12, symbol@got@pcrel(0), 1 pld r12, symbol@got@pcrel
lvx v1, 0, r12</programlisting> lvx v1, 0, r12</programlisting>
<para revisionflag="added"> <para revisionflag="added">
A compiler may generate a PC-relative addressing sequence to access A compiler may generate a PC-relative addressing sequence to access
@ -6450,16 +6399,6 @@ lvx v1, 0, r12</programlisting>
the data reference is satisfied at static link time. See the data reference is satisfied at static link time. See
<xref linkend="dbdoclet.50655241_OptPCRel" />. <xref linkend="dbdoclet.50655241_OptPCRel" />.
</para> </para>
<para revisionflag="added">[To discuss: I'd like to see the assembler
support "pld r12, symbol@pcrel" as an alternative to "pld r12,
symbol@pcrel(0), 1", and "pld r12, symbol@got@pcrel" as an
alternative to "pld r12, symbol@got@pcrel(0), 1". In general, any
prefix load/store with only two arguments is PC-relative; the
second argument is either a 34-bit offset or a GPR. Is this
reasonable or too confusing? Another alternative would be "pld r12,
symbol@pcrel(cia)" for an offset, and "pld r12, r5, cia" for the
GPR case. I guess we want something readable that isn't too
complex for the assembler to sort out.]</para>
<para>Position-independent executables or shared objects have a GOT in <para>Position-independent executables or shared objects have a GOT in
the data segment that holds addresses. When the system creates a memory the data segment that holds addresses. When the system creates a memory
image from the file, the GOT entries are updated to reflect the image from the file, the GOT entries are updated to reflect the
@ -6477,11 +6416,11 @@ lvx v1, 0, r12</programlisting>
</section> </section>
<section xml:id="dbdoclet.50655240_19143"> <section xml:id="dbdoclet.50655240_19143">
<title>Code Models</title> <title>Code Models</title>
<bridgehead revisionflag="added">TOC-Based Compilation
Units</bridgehead>
<para>Compilers may provide different code models depending on the <para>Compilers may provide different code models depending on the
expected size of the TOC and the size of the entire executable or expected size of the TOC and the size of the entire executable or
shared library.</para> shared library. <phrase revisionflag="added">Assuming that the
TOC pointer is used to address data and/or text, the following
considerations apply:</phrase></para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>Small code model: The TOC is accessed using 16-bit offsets <para>Small code model: The TOC is accessed using 16-bit offsets
@ -6524,52 +6463,26 @@ lvx v1, 0, r12</programlisting>
TOCs, or by some other method. The suggested allocation order of TOCs, or by some other method. The suggested allocation order of
sections is provided in sections is provided in
<xref linkend="dbdoclet.50655241_66700" />.</para> <xref linkend="dbdoclet.50655241_66700" />.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added"> <para revisionflag="added">
Compilers may provide different code models depending on the size of PC-relative addressing may be used in either the small or the
the entire executable or shared library. There is no small code medium code model, and is identical for both. Accesses to
model for PC-relative compilation units. module-local code and data objects use PC-relative addressing with
</para> up to 34-bit offsets. Position-independent code uses PC-relative
<itemizedlist revisionflag="added"> GOT-indirect addressing to access other objects in the binary.
<listitem> If PC-relative addressing span is insufficient to reach any data
<para> item, that access must either be made relative to the TOC
Medium code model: Accesses to module-local code and data objects pointer, or a PC-relative indexed form instruction must be used
use PC-relative addressing with 34-bit offsets. for the access. PC-relative indexed form instructions provide
Position-independent code uses PC-relative GOT-indirect up to 64 bits of offset from the current instruction address.
addressing to access other objects in the binary. [To discuss: I'm deliberately leaving this flexible for now.
</para> Any concerns? It appears we will probably not see a
</listitem> load-high-immediate-32 sort of instruction in P10, so we won't
<listitem> be able to define those kinds of relocs yet.]
<para>
Large code model: Used when 34-bit offsets are insufficient to
reach global data or the GOT from at least one text section,
this is similar to the medium code model, except that up to
64-bit PC-relative offsets are used by generating them into a
register. [To discuss: None of the options for this seem ideal.
It takes about 5 instructions to generate a 64-bit constant into
a register, though we can perhaps use linker optimizations to
replace with a smaller sequence when available. A second choice
is to place the offset in a .quad in the text section to reach
the .got entry, but this would incur a load-load dependency.
(Are there cases where this requires a text relocation resolution
during dynamic linking?) A third choice is to fail the compile
and require TOC addressing with large code model when 34-bit
offsets aren't enough, though that doesn't initially seem
reasonable. Whatever we choose, we should document the sequence
and any associated linker optimizations.]
</para>
</listitem>
</itemizedlist>
<para revisionflag="added">
As with TOC-based compilation units, the medium code model is the
default for compilers, and is applicable to most programs and
libraries. The code examples in this document generally use the
medium code model.
</para> </para>
<para revisionflag="added"> <para revisionflag="added">
When linking PC-relative relocatable objects, the linker should When linking objects that contain PC-relative relocations, the
attempt to place the .got section near the text sections. linker should attempt to place the .got section near the text
sections.
</para> </para>
</section> </section>
</section> </section>
@ -6579,50 +6492,13 @@ lvx v1, 0, r12</programlisting>
section.</para> section.</para>
<section xml:id="dbdoclet.50655240___RefHeading___Toc377640597"> <section xml:id="dbdoclet.50655240___RefHeading___Toc377640597">
<title>Function Prologue</title> <title>Function Prologue</title>
<para revisionflag="added">The function prologue is responsible for <para>A function's prologue establishes addressability by
the following functions:</para>
<itemizedlist revisionflag="added">
<listitem>
<para>Establishing addressability to global data</para>
</listitem>
<listitem>
<para>Creating a stack frame when required</para>
</listitem>
<listitem>
<para>Saving any nonvolatile registers that are used by the
function</para>
</listitem>
<listitem>
<para>Saving any limited-access bits that are used by the function,
per the rules described in <xref
linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para>
</listitem>
</itemizedlist>
<para revisionflag="added">This ABI shall be used in conjunction with
the Power Architecture that implements the
<emphasis role="bold">mfocrf</emphasis> architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined
bits in a manner to allow the combination of multiple
<emphasis role="bold">mfocrf</emphasis> results with an OR instruction;
for example, to yield a word in r0 including all three preserved CRs as
follows:</para>
<programlisting revisionflag="added">mfocrf r0, crf2
mfocrf r1, crf3
or r0, r0, r1
mfocrf r1, crf4
or r0, r0, r1</programlisting>
<para revisionflag="added">Specifically, this allows each
OpenPOWER-compliant processor implementation to set each field to hold
either 0 or the correct in-order value of the corresponding CR field at
the point where the <emphasis role="bold">mfocrf</emphasis>
instruction is performed.</para>
<bridgehead revisionflag="added">TOC-Based Compilation
Units</bridgehead>
<para><phrase revisionflag="changed">In a TOC-based compilation unit,
a</phrase> function's prologue establishes addressability by
initializing a TOC pointer in register r2, if necessary, and a stack initializing a TOC pointer in register r2, if necessary, and a stack
frame, if necessary, and may save any nonvolatile registers it frame, if necessary, and may save any nonvolatile registers it
uses.</para> uses. <phrase revisionflag="added">Not all functions must initialize
a TOC pointer, and not all functions must preserve the existing value
of r2. See <xref linkend="dbdoclet.50655241_FnLinkage" /> for more
information.</phrase></para>
<para>All functions have a global entry point (GEP) available to any <para>All functions have a global entry point (GEP) available to any
caller and pointing to the beginning of the prologue. Some functions caller and pointing to the beginning of the prologue. Some functions
may have a secondary entry point to optimize the cost of TOC pointer may have a secondary entry point to optimize the cost of TOC pointer
@ -6636,7 +6512,9 @@ or r0, r0, r1</programlisting>
entry point when the r2 register is known to hold a valid TOC base entry point when the r2 register is known to hold a valid TOC base
value. Function pointers shared between modules shall always use the value. Function pointers shared between modules shall always use the
global entry point to specify the address of a function.</para> global entry point to specify the address of a function.</para>
<para>When a linker causes control to transfer to a global entry point, <para>When a linker causes control to transfer to a global entry point
<phrase revisionflag="added">of a function that requires a TOC
pointer</phrase>,
it must insert a glue code sequence that loads r12 with the global it must insert a glue code sequence that loads r12 with the global
entry-point address. Code at the global entry point can assume that entry-point address. Code at the global entry point can assume that
register r12 points to the GEP.</para> register r12 points to the GEP.</para>
@ -6653,10 +6531,9 @@ addi r2, r2, .TOC.-func@l</programlisting>
form that is faster due to instruction fusion, such as:</para> form that is faster due to instruction fusion, such as:</para>
<programlisting>lis r2, .TOC.@ha <programlisting>lis r2, .TOC.@ha
addi r2, r2, .TOC.@l</programlisting> addi r2, r2, .TOC.@l</programlisting>
<para revisionflag="deleted">In addition to establishing <para>In addition to establishing addressability, the function prologue
addressability, the function prologue
is responsible for the following functions:</para> is responsible for the following functions:</para>
<itemizedlist revisionflag="deleted"> <itemizedlist>
<listitem> <listitem>
<para>Creating a stack frame when required</para> <para>Creating a stack frame when required</para>
</listitem> </listitem>
@ -6670,7 +6547,7 @@ addi r2, r2, .TOC.@l</programlisting>
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para> <xref linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para revisionflag="deleted">This ABI shall be used in conjunction with <para>This ABI shall be used in conjunction with
the Power Architecture that implements the the Power Architecture that implements the
<emphasis role="bold">mfocrf</emphasis> architecture level. Further, <emphasis role="bold">mfocrf</emphasis> architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined OpenPOWER-compliant processors shall implement implementation-defined
@ -6678,12 +6555,12 @@ addi r2, r2, .TOC.@l</programlisting>
<emphasis role="bold">mfocrf</emphasis> results with an OR instruction; for example, <emphasis role="bold">mfocrf</emphasis> results with an OR instruction; for example,
to yield a word in r0 including all three preserved CRs as to yield a word in r0 including all three preserved CRs as
follows:</para> follows:</para>
<programlisting revisionflag="deleted">mfocrf r0, crf2 <programlisting>mfocrf r0, crf2
mfocrf r1, crf3 mfocrf r1, crf3
or r0, r0, r1 or r0, r0, r1
mfocrf r1, crf4 mfocrf r1, crf4
or r0, r0, r1</programlisting> or r0, r0, r1</programlisting>
<para revisionflag="deleted">Specifically, this allows each <para>Specifically, this allows each
OpenPOWER-compliant processor implementation to set each field to hold OpenPOWER-compliant processor implementation to set each field to hold
either 0 or the correct in-order value of the corresponding CR field at either 0 or the correct in-order value of the corresponding CR field at
the point where the <emphasis role="bold">mfocrf</emphasis> the point where the <emphasis role="bold">mfocrf</emphasis>
@ -6707,14 +6584,6 @@ or r0, r0, r1</programlisting>
the meaning of the second parameter, which is put in the three the meaning of the second parameter, which is put in the three
most-significant bits of the st_other field in the ELF Symbol Table most-significant bits of the st_other field in the ELF Symbol Table
entry.</para> entry.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
In a PC-relative compilation unit, the function prologue does not
require any setup code to establish addressability to global data.
Therefore there is also no need for a function to have a separate
local entry point.
</para>
</section> </section>
<section xml:id="dbdoclet.50655240_13754"> <section xml:id="dbdoclet.50655240_13754">
<title>Function Epilogue</title> <title>Function Epilogue</title>
@ -7438,12 +7307,12 @@ ptr = &amp;dst;
.extern dst .extern dst
.extern ptr .extern ptr
.section ".text" .section ".text"
plwz r9, src@pcrel(0), 1 plwz r9, src@pcrel
pstw r9, dst@pcrel(0), 1 pstw r9, dst@pcrel
paddi r11, 0, dst@pcrel, 1 paddi r11, dst@pcrel
pstd r11, ptr@pcrel(0), 1 pstd r11, ptr@pcrel
pld r11, ptr@pcrel(0), 1 pld r11, ptr@pcrel
plwz r9, src@pcrel(0), 1 plwz r9, src@pcrel
stw r9, 0(r11)</programlisting> stw r9, 0(r11)</programlisting>
</entry> </entry>
</row> </row>
@ -7467,8 +7336,8 @@ stw r9, 0(r11)</programlisting>
a signed 32-bit offset from a base register.</para> a signed 32-bit offset from a base register.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>For a PIC code (see <para>For <phrase revisionflag="changed">TOC-based</phrase> PIC
<xref linkend="dbdoclet.50655240_page77" /> and code (see <xref linkend="dbdoclet.50655240_page77" /> and
<xref linkend="dbdoclet.50655240_19926" />), the offset in the <xref linkend="dbdoclet.50655240_19926" />), the offset in the
Global Offset Table where the value of the symbol is stored is Global Offset Table where the value of the symbol is stored is
given by the assembly syntax symbol@got. This syntax represents the given by the assembly syntax symbol@got. This syntax represents the
@ -7611,8 +7480,8 @@ nop</programlisting>
</listitem> </listitem>
</orderedlist> </orderedlist>
<para revisionflag="added"> <para revisionflag="added">
For a function call in a PC-relative compilation unit, the nop in For a function call in a function that does not preserve r2, the nop in
<xref linkend="dbdoclet.50655240_85319" /> should not be generated. <xref linkend="dbdoclet.50655240_85319" /> need not be generated.
</para> </para>
<para>For indirect function calls, the address of the function to be <para>For indirect function calls, the address of the function to be
called is placed in r12 and the CTR register. A bctrl instruction is used called is placed in r12 and the CTR register. A bctrl instruction is used
@ -7688,9 +7557,6 @@ bctrl</programlisting>
<para> <para>
<xref linkend="dbdoclet.50655240_16744" /> shows how to make an indirect <xref linkend="dbdoclet.50655240_16744" /> shows how to make an indirect
function call using small-model position-independent code. function call using small-model position-independent code.
<phrase revisionflag="added">Note that the store and reload of the
TOC pointer r2 is not required in a PC-relative compilation
unit.</phrase>
</para> </para>


<figure xml:id="dbdoclet.50655240_16744"> <figure xml:id="dbdoclet.50655240_16744">
@ -7762,9 +7628,6 @@ ld r2,24(r1)</programlisting>
<para> <para>
<xref linkend="dbdoclet.50655240_95225" /> shows how to make an indirect <xref linkend="dbdoclet.50655240_95225" /> shows how to make an indirect
function call using large-model position-independent code. function call using large-model position-independent code.
<phrase revisionflag="added">Note that the store and reload of the
TOC pointer r2 is not required in a PC-relative compilation
unit.</phrase>
</para> </para>


<figure xml:id="dbdoclet.50655240_95225"> <figure xml:id="dbdoclet.50655240_95225">
@ -7776,8 +7639,14 @@ ld r2,24(r1)</programlisting>
</imageobject> </imageobject>
</mediaobject> </mediaobject>
</figure> </figure>
<!--table frame="all" pgwide="1" xml:id="dbdoclet.50655240_95225"> <para>
<title>Large-Model Position-Independent Indirect Function Call</title> <xref linkend="dbdoclet.50655240_PCRelPICIndirect" /> shows how to
make an indirect function call using PC-relative addressing in a
function that does not preserve r2. [TBD: Formatting]
</para>
<table frame="all" pgwide="1"
xml:id="dbdoclet.50655240_PCRelPICIndirect">
<title>PC-Relative Position-Independent Indirect Function Call</title>
<tgroup cols="2"> <tgroup cols="2">
<colspec colname="c1" colwidth="30*" /> <colspec colname="c1" colwidth="30*" />
<colspec colname="c2" colwidth="70*" /> <colspec colname="c2" colwidth="70*" />
@ -7799,59 +7668,69 @@ ld r2,24(r1)</programlisting>
<row> <row>
<entry> <entry>
<programlisting>extern void function( ); <programlisting>extern void function( );

extern void (*ptrfunc) ( ); extern void (*ptrfunc) ( );
ptrfunc=function; ptrfunc=function;







(*ptrfunc) ( ); (*ptrfunc) ( );








</programlisting> </programlisting>
</entry> </entry>
<entry> <entry>
<programlisting> <programlisting>.section .text


addis r9,r2,ptrfunc@got@ha pld r9,ptrfunc@got@pcrel
ld r9,ptrfunc@got@l(r9) pld r0,function@got@pcrel
addis r12,r2,function@got@ha std r0,0(r9)
ld r12,function@got@l(r12)
std r12,0(r9)


addis r9,r2,ptrfunc@got@ha pld r9, ptrfunc@got@pcrel
ld r9,ptrfunc@got@l(r9)
ld r12,0(r9) ld r12,0(r9)
std r2,24(r1)
mtctr r12 mtctr r12
bctrl bctrl</programlisting>
ld r2,24(r1)</programlisting>
</entry> </entry>
</row> </row>
</tbody> </tbody>
</tgroup> </tgroup>
</table --> </table>
<bridgehead revisionflag="added">TOC-Based Compilation Units</bridgehead> <para>Function calls <phrase revisionflag="added">often</phrase>
<para>Function calls need to be performed in conjunction with need to be performed in conjunction with
establishing, maintaining, and restoring addressability through the TOC establishing, maintaining, and restoring addressability through the TOC
pointer register, r2. When a function is called, the TOC pointer register pointer register, r2. When a function is called, the TOC pointer register
may be modified. The caller must provide a nop after the bl instruction may be modified. <phrase revisionflag="added">In many cases,</phrase>
performing a call, if r2 is not known to have the same value in the <phrase revisionflag="changed">the</phrase> caller must provide a nop
callee. This is generally true for external calls. The linker will after the bl instruction performing a call, if r2 is not known to have
replace the nop with an r2 restoring instruction if the caller and callee the same value in the callee. This is generally true for external calls.
use different r2 values, The linker leaves it unchanged if they use the The linker will replace the nop with an r2 restoring instruction if the
same r2 value. This scheme avoids having a compiler generate an caller and callee use different r2 values<phrase
revisionflag="changed">.</phrase> The linker leaves it unchanged if they
use the same r2 value. This scheme avoids having a compiler generate an
overconservative r2 save and restore around every external call.</para> overconservative r2 save and restore around every external call.</para>
<para revisionflag="added">
There are two cases where the caller should not provide a nop after
the bl instruction performing a call:
<itemizedlist spacing="compact">
<listitem><para>When the caller is not guaranteed to preserve r2 (see
<xref linkend="dbdoclet.50655241_95185" />); or</para></listitem>
<listitem><para>When the callee is in the same compilation unit and
is guaranteed to preserve r2.</para></listitem>
</itemizedlist>
In both cases, the bl instruction must be marked with an
R_PPC64_REL24_NOTOC relocation.
</para>
<para>For calls to functions resolved at runtime, the linker must <para>For calls to functions resolved at runtime, the linker must
generate stub code to load the function address from the PLT.</para> generate stub code to load the function address from the PLT.</para>
<para>The stub code also must save r2 to 24(r1) unless the call is marked <para>The stub code also must save r2 to 24(r1) unless
<phrase revisionflag="added">either the call is marked with an
R_PPC64_REL24_NOTOC relocation as above, or</phrase>
the call is marked
with an R_PPC64_TOCSAVE relocation that points to a nop provided in the with an R_PPC64_TOCSAVE relocation that points to a nop provided in the
caller's prologue. In that case, the stub code can omit the r2 save. caller's prologue. In <phrase revisionflag="changed">either</phrase>
Instead, the linker replaces the prologue nop with an r2 save.</para> case, the stub code can omit the r2 save.
<phrase revisionflag="changed">In the latter case,</phrase>
the linker replaces the prologue nop with an r2 save.</para>
<programlisting>tocsaveloc: <programlisting>tocsaveloc:
nop nop
... ...
@ -7868,19 +7747,6 @@ bl target
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" />, <xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" />,
<xref linkend="dbdoclet.50655241_95185" />, and <xref linkend="dbdoclet.50655241_95185" />, and
<xref linkend="dbdoclet.50655241_47572" />.</para> <xref linkend="dbdoclet.50655241_47572" />.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
As with TOC-based compilation units, for calls to functions resolved at
runtime, the linker must generate stub code to load the function
address from the PLT. When the stub code is generated on behalf of
an indirect call in a PC-relative compilation unit, the linker may
omit the save and restore of r2 from the stub code. This behavior
is optional but recommended. Calls in PC-relative code should not
be marked with the R_PPC64_TOCSAVE or R_PPC64_REL24_NOTOC relocations.
[To discuss: Do we need a relocation to identify this as a PC-relative
call?]
</para>
</section> </section>
<section xml:id="dbdoclet.50655240_47036"> <section xml:id="dbdoclet.50655240_47036">
<title>Branching</title> <title>Branching</title>
@ -8277,12 +8143,7 @@ f1:
</figure> </figure>
<para revisionflag="added"> <para revisionflag="added">
<xref linkend="dbdoclet.50655240_PCRelSwitch" /> shows a switch <xref linkend="dbdoclet.50655240_PCRelSwitch" /> shows a switch
implementation for PC-relative compilation units. [TBD: This needs to implementation for PC-relative compilation units. [TBD: Formatting]
be a figure, not a table, which may require working with Annette and
FrameMaker to get something that looks similar to the other figures.
All we have in the document for the other figures is .png files from
the old FrameMaker version. Or maybe we should just convert all the
other figures to tables.]
</para> </para>
<table frame="all" pgwide="1" xml:id="dbdoclet.50655240_PCRelSwitch" <table frame="all" pgwide="1" xml:id="dbdoclet.50655240_PCRelSwitch"
revisionflag="added"> revisionflag="added">
@ -8328,7 +8189,7 @@ default:
<programlisting> cmplwi r12, 4 <programlisting> cmplwi r12, 4
bge .Ldefault bge .Ldefault
slwi r12, 2 slwi r12, 2
paddi r10, r0, .Ltab@pcrel, 1 paddi r10, .Ltab@pcrel
lwax r8, r10, r12 lwax r8, r10, r12
add r10, r8, r10 add r10, r8, r10
mtctr r10 mtctr r10
@ -8416,11 +8277,6 @@ addi r3,r1,p ; R3 = new data area following parameter save area.</pro
a value that needs to be preserved. In the future, if it is defined and a value that needs to be preserved. In the future, if it is defined and
if the function uses the Reserved word, the LR save doubleword must also if the function uses the Reserved word, the LR save doubleword must also
be copied.</para> be copied.</para>
<para revisionflag="added">
It is unnecessary to copy the TOC pointer doubleword for a
PC-relative compilation unit. [To discuss: Should we, for future
use of this slot for another purpose?]
</para>
<note> <note>
<para>Additional instructions will be necessary for an allocation of <para>Additional instructions will be necessary for an allocation of
variable size. If a dynamic deallocation will occur, the r1 stack variable size. If a dynamic deallocation will occur, the r1 stack
@ -8794,6 +8650,10 @@ addi r3,r1,p ; R3 = new data area following parameter save area.</pro
in the Itanium C++ ABI, the normative text on the issue. For information in the Itanium C++ ABI, the normative text on the issue. For information
about how to locate this material, see about how to locate this material, see
<xref linkend="dbdoclet.50655239___RefHeading___Toc377640569" />.</para> <xref linkend="dbdoclet.50655239___RefHeading___Toc377640569" />.</para>
<para revisionflag="added">
[Ignorant question to discuss: Are there any impacts to unwinding from
new r2 preservation rules?]
</para>


</section> </section>
</chapter> </chapter>

File diff suppressed because it is too large Load Diff

@ -698,10 +698,12 @@ PPC_FEATURE_HAS_VSX 0x00000080 /* P7 Vector Extension. */
PPC_FEATURE_PSERIES_PERFMON_COMPAT 0x00000040 PPC_FEATURE_PSERIES_PERFMON_COMPAT 0x00000040
PPC_FEATURE_TRUE_LE 0x00000002 PPC_FEATURE_TRUE_LE 0x00000002
PPC_FEATURE_PPC_LE 0x00000001</programlisting> PPC_FEATURE_PPC_LE 0x00000001</programlisting>
<para revisionflag="added">Bit 0x00000004 is reserved for kernel use.
</para>
<para>AT_HWCAP2</para> <para>AT_HWCAP2</para>
<para>The a_val member of this entry is a bit map of hardware <para>The a_val member of this entry is a bit map of hardware
capabilities. Some bit mask values include:</para> capabilities. Some bit mask values include:</para>
<programlisting>PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */ <programlisting revisionflag="changed">PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */
PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */ PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */
PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */ PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */
PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */ PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */
@ -711,7 +713,10 @@ PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the
Vector.AES category */ Vector.AES category */
PPC_FEATURE2_HTM_NOSC 0x01000000 PPC_FEATURE2_HTM_NOSC 0x01000000
PPC_FEATURE2_ARCH_3_00 0x00800000 /* ISA 3.0 */ PPC_FEATURE2_ARCH_3_00 0x00800000 /* ISA 3.0 */
PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</programlisting> PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */
PPC_FEATURE2_DARN 0x00200000 /* darn instruction */
PPC_FEATURE2_SCV 0x00100000 /* scv syscall */
PPC_FEATURE2_HTM_NO_SUSPEND 0x00080000 /* TM without suspended state */</programlisting>
<para>When a process starts to execute, its stack holds the arguments, <para>When a process starts to execute, its stack holds the arguments,
environment, and auxiliary vector received from the exec call. The system environment, and auxiliary vector received from the exec call. The system
makes no guarantees about the relative arrangement of argument strings, makes no guarantees about the relative arrangement of argument strings,
@ -797,10 +802,6 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
linking code that references the .TOC. address. The GOT consists of an linking code that references the .TOC. address. The GOT consists of an
8-byte header that contains the TOC base (the first TOC base when 8-byte header that contains the TOC base (the first TOC base when
multiple TOCs are present), followed by an array of 8-byte addresses. multiple TOCs are present), followed by an array of 8-byte addresses.
<phrase revisionflag="added">
The 8-byte header value is undefined when all linked compilation units
are PC-relative.
</phrase>
The link editor shall emit dynamic relocations as appropriate for each The link editor shall emit dynamic relocations as appropriate for each
entry in the GOT. At runtime, the dynamic linker will apply these entry in the GOT. At runtime, the dynamic linker will apply these
relocations after the addresses of all memory segments are known (and relocations after the addresses of all memory segments are known (and
@ -816,10 +817,7 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
the executable or shared objects in a different process image. After the the executable or shared objects in a different process image. After the
initial mapping of the process image by the dynamic linker, memory initial mapping of the process image by the dynamic linker, memory
segments reside at fixed addresses for the life of a process.</para> segments reside at fixed addresses for the life of a process.</para>
<para><phrase revisionflag="added">When at least one TOC-based <para>The symbol .TOC. may be used to access the GOT or in TOC-relative
compilation unit is to be linked,</phrase>
<phrase revisionflag="changed">the</phrase>
symbol .TOC. may be used to access the GOT or in TOC-relative
addressing to other data constructs, such as the procedure linkage table. addressing to other data constructs, such as the procedure linkage table.
The symbol may be offset by 0x8000 bytes, or another offset, from the The symbol may be offset by 0x8000 bytes, or another offset, from the
start of the .got section. This offset allows the use of the full (64 KB) start of the .got section. This offset allows the use of the full (64 KB)
@ -830,15 +828,15 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
addis instruction to provide a first high-order 16-bit portion of a addis instruction to provide a first high-order 16-bit portion of a
32-bit displacement in conjunction with an instruction to supply a 32-bit displacement in conjunction with an instruction to supply a
low-order 16-bit portion of a 32-bit displacement.</para> low-order 16-bit portion of a 32-bit displacement.</para>
<para>In PIC code, the TOC pointer r2 points to the TOC base, enabling <para>In PIC code<phrase revisionflag="added"> that uses the
TOC</phrase>, the TOC pointer r2 points to the TOC base, enabling
easy reference. For static nonrelocatable modules, the GOT address is easy reference. For static nonrelocatable modules, the GOT address is
fixed and can be directly used by code.</para> fixed and can be directly used by code.</para>
<para>All functions <phrase revisionflag="added">in TOC-based <para revisionflag="deleted">All functions except leaf routines must
compilation units</phrase> except leaf routines must load the value of load the value of the TOC base into the TOC register r2.</para>
the TOC base into the TOC register r2.</para>
<para revisionflag="added"> <para revisionflag="added">
Functions in PC-relative compilation units access GOT entries directly Code may access GOT entries directly using PC-relative addressing,
using PC-relative addressing. where available.
</para> </para>
</section> </section>
<section> <section>
@ -998,13 +996,6 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
is indicated by use of a R_PPC64_REL24_NOTOC relocation (instead of is indicated by use of a R_PPC64_REL24_NOTOC relocation (instead of
R_PPC64_REL24) on the call instruction.</para> R_PPC64_REL24) on the call instruction.</para>
</listitem> </listitem>
<listitem revisionflag="added">
<para>
The caller is PC-relative and does not need to save the TOC
pointer. [To discuss: Do we need a relocation, or will we have
a module-level bit the linker can detect?]
</para>
</listitem>
</orderedlist> </orderedlist>
<para>In any scenario, the PLT call stub must transfer control to the <para>In any scenario, the PLT call stub must transfer control to the
function whose address is provided in the associated PLT entry. This function whose address is provided in the associated PLT entry. This
@ -1053,14 +1044,12 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
mtctr r12 mtctr r12
bctr</programlisting> bctr</programlisting>
<para revisionflag="added"> <para revisionflag="added">
A possible implementation for case 4 looks as follows: When PC-relative addressing is available, another simpler variant
may alternatively be used for cases 2 or 3:
</para> </para>
<programlisting>pld r12, func@plt@got@pcrel(0), 1 <programlisting revisionflag="added">pld r12, func@plt@pcrel
mtctr r12 mtctr r12
bctr</programlisting> bctr</programlisting>
<para revisionflag="added">
[To discuss: Is that the right assembly syntax?]
</para>
<para>To support lazy binding, the link editor also provides a set of <para>To support lazy binding, the link editor also provides a set of
symbol resolver stubs, one for each PLT entry. Each resolver stub symbol resolver stubs, one for each PLT entry. Each resolver stub
consists of a single instruction, which is usually a branch to a common consists of a single instruction, which is usually a branch to a common
@ -1133,10 +1122,7 @@ bctr</programlisting>
<para>After resolution, the value of a PLT entry in the PLT is the <para>After resolution, the value of a PLT entry in the PLT is the
address of the functions global entry point, unless the resolver address of the functions global entry point, unless the resolver
can determine that a module-local call occurs with a shared TOC value can determine that a module-local call occurs with a shared TOC value
wherein the TOC is shared between the caller and the wherein the TOC is shared between the caller and the callee.</para>
<phrase revisionflag="changed">callee,</phrase>
<phrase revisionflag="added">or a module-local call occurs in a
PC-relative compilation unit. [?]</phrase></para>
</section> </section>
</section> </section>
</section> </section>

Loading…
Cancel
Save