First draft of PC-relative changes for internal review.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>
master
Bill Schmidt 7 years ago
parent e7042f258a
commit 16ef9435f5

@ -94,11 +94,11 @@
<revhistory> <revhistory>
<!-- TODO: Set the initial version information and clear any old information out --> <!-- TODO: Set the initial version information and clear any old information out -->
<revision> <revision>
<date>2018-03-02</date> <date>2018-03-14</date>
<revdescription> <revdescription>
<itemizedlist spacing="compact"> <itemizedlist spacing="compact">
<listitem> <listitem>
<para>Revision 1.5: POWER10 support.</para> <para>Revision 1.5a: PC-relative addressing first draft.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</revdescription> </revdescription>

@ -4032,7 +4032,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</figure> </figure>


<note> <note>
<para><xref linkend="dbdoclet.50655240_30073" /> , the alignment of the <para>In <xref linkend="dbdoclet.50655240_30073" />, the alignment
of the
structure is not affected by the unnamed short and int fields. The structure is not affected by the unnamed short and int fields. The
named members are aligned relative to the start of the structure. named members are aligned relative to the start of the structure.
However, it is possible that the alignment of the named members is However, it is possible that the alignment of the named members is
@ -4044,6 +4045,70 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</section> </section>
</section> </section>
</section> </section>
<section revisionflag="added" xml:id="dbdoclet.50655240_AddrModel">
<title revisionflag="added">Global Data Addressing Models</title>
<para revisionflag="added">This specification provides for two global data
addressing models. The traditional addressing model, which we will call
"TOC-based," relies on a dedicated table-of-contents (TOC) pointer to
obtain the addresses of global data. PowerISA version 3.1 introduces new
"PC-relative" instructions that can be used to obtain the addresses of
global data relative to the current instruction address (CIA). Code that
is targeted to run on hardware compliant with PowerISA 3.1 may make use of
this capability with a "PC-relative" addressing model.</para>
<para revisionflag="added">Each compilation unit must adhere entirely to
one addressing model or the other. However, it is expressly possible to
link TOC-based and PC-relative compilation units into a single
executable, or to dynamically link from a compilation unit with one
addressing model to a compilation unit with the other addressing model.
In particular, a PC-relative compilation unit may be linked with an
existing TOC-based library. Note that a "compilation unit" may consist of
hand-written assembly code as well as high-level source code.</para>
<para revisionflag="added">Compilers and other tools performing
link-time optimizations that repackage functions into different
compilation units must not mix PC-relative and TOC-based functions in
the same compilation unit. [To discuss: This could be permitted, but
the value is unclear and it would be likely to spawn occasional
linker bugs.] Similarly, programmers should not be allowed to
specify a single function in a TOC-based compilation unit to use the
PC-relative addressing model or vice versa; for example, using GCC's
"#pragma target" syntax. [To discuss: How should this be recorded and
communicated? Perhaps add to e_flags in the ELF header for module
objects only? We can communicate the need for PC-relative PLT stubs
to the linker on calls with a reloc, so the linker may not need this,
but perhaps other tools will?]</para>
<para revisionflag="added">Details of the two addressing models will be
provided throughout this specification. However, a brief description
of each is in order.</para>
<section revisionflag="added" xml:id="dbdoclet.50655240_TOCBased">
<title revisionflag="added">TOC-Based Addressing Model</title>
<para revisionflag="added">In the traditional TOC-based addressing model,
each function uses register r2 (see <xref
linkend="dbdoclet.50655240_68174" />) to access global memory. A variety
of techniques, known as TOC-relative, TOC-indirect, GOT-relative, etc.,
may be used to address the global data, but all these techniques use the
TOC pointer r2 as part of the data reference.</para>
<para revisionflag="added">With the cooperation of the linker, each
function in a TOC-based compilation unit is responsible for the
establishment and maintenance of its own TOC pointer. All functions
within a compilation unit have the same TOC pointer, so local function
calls may assume it does not change. An external function call may be
resolved to a function in a shared object having a different TOC
pointer, so a caller in a TOC-based compilation unit must save its TOC
pointer prior to making a call outside the compilation unit, and restore
its value upon return before the TOC pointer may be used to access global
data.</para>
</section>
<section revisionflag="added" xml:id="dbdoclet.50655240_PCRel">
<title revisionflag="added">PC-Relative Addressing Model</title>
<para revisionflag="added">A function in a PC-relative compilation unit
has no TOC pointer. All accesses to global data are made relative to
the current instruction address. Since functions in TOC-based
compilation units are responsible for establishment and maintenance
of their own TOC pointers, register r2 may be used freely within a
PC-relative compilation unit, with no need to save or restore the
register when modifying it.</para>
</section>
</section>
<section xml:id="dbdoclet.50655240_85672"> <section xml:id="dbdoclet.50655240_85672">
<title>Function Calling Sequence</title> <title>Function Calling Sequence</title>
<para>The standard sequence for function calls is outlined in this section. <para>The standard sequence for function calls is outlined in this section.
@ -4208,15 +4273,22 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</entry> </entry>
<entry> <entry>
<para>Nonvolatile<footnote> <para>Nonvolatile<footnote>
<para>Register r2 is nonvolatile with respect to calls <para><phrase revisionflag="changed">In a TOC-based
between functions in the same compilation unit. It is saved compilation unit, register</phrase> r2 is nonvolatile with
and restored by code inserted by the linker resolving a respect to calls between functions in the same compilation
call to an external function. For more information, see unit. It is saved and restored by code inserted by the linker
<xref linkend="dbdoclet.50655240_51083" />.</para> resolving a call to an external function. For more
</footnote></para> information, see <xref linkend="dbdoclet.50655240_51083"
/>.</para>
</footnote><phrase revisionflag="added"> or
Volatile<footnote>
<para>Register r2 is volatile and available for use in
PC-relative compilation units.</para>
</footnote></phrase></para>
</entry> </entry>
<entry> <entry>
<para>TOC pointer.</para> <para>TOC pointer <phrase revisionflag="added"> for
TOC-based compilation units</phrase>.</para>
</entry> </entry>
</row> </row>
<row> <row>
@ -4388,7 +4460,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</table> </table>
<para>&#160;</para> <para>&#160;</para>
<bridgehead xml:id="dbdoclet.50655240_51083">TOC Pointer <bridgehead xml:id="dbdoclet.50655240_51083">TOC Pointer
Usage</bridgehead> Usage <phrase revisionflag="added">(TOC-Based Compilation Units
Only)</phrase></bridgehead>
<para>As described in <para>As described in
<xref linkend="dbdoclet.50655241_73385" />, the TOC pointer, r2, is <xref linkend="dbdoclet.50655241_73385" />, the TOC pointer, r2, is
commonly initialized by the global function entry point when a function commonly initialized by the global function entry point when a function
@ -4497,12 +4570,15 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
mask the value received from mask the value received from
<emphasis role="bold">mfocr</emphasis> to avoid corruption of the resulting <emphasis role="bold">mfocr</emphasis> to avoid corruption of the resulting
(partial) condition register word.</para> (partial) condition register word.</para>
<para>This erratum does not apply to the POWER9 processor.</para> <para>This erratum does not apply to <phrase
revisionflag="changed">POWER9 and subsequent
processors.</phrase></para>
</note> </note>


<para><anchor xml:id="dbdoclet.50655240_Power-ISA-version-and-the-user-s-manual" <para><anchor xml:id="dbdoclet.50655240_Power-ISA-version-and-the-user-s-manual"
xreflabel="" />For more information, see xreflabel="" />For more information, see
<citetitle>Power ISA</citetitle>, version 3.0 and "Fixed-Point Invalid <citetitle>Power ISA</citetitle>, version <phrase
revisionflag="changed">3.0B</phrase> and "Fixed-Point Invalid
Forms and Undefined Conditions" in Forms and Undefined Conditions" in
<citetitle>POWER9 Processor User's Manual.</citetitle></para> <citetitle>POWER9 Processor User's Manual.</citetitle></para>
<bridgehead>Floating-Point Registers</bridgehead> <bridgehead>Floating-Point Registers</bridgehead>
@ -5124,8 +5200,16 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
is volatile over a function call.</para> is volatile over a function call.</para>
<para>&#160;</para> <para>&#160;</para>
<bridgehead>TOC Pointer Doubleword</bridgehead> <bridgehead>TOC Pointer Doubleword</bridgehead>
<para>If a function changes the value of the TOC pointer register, it <para>If a function <phrase revisionflag="added">in a TOC-based
shall first save it in the TOC pointer doubleword.</para> compilation unit</phrase> changes the value of the TOC pointer
register, it shall first save it in the TOC pointer doubleword.
<phrase revisionflag="added">The TOC pointer doubleword is reserved
for future use for functions in a PC-relative compilation
unit. [To discuss: This has implications for alloca, as if we
reserve it for future use, then the TOC pointer doubleword must be
copied during a dynamic allocation operation. I suspect it is
better to suffer that slight penalty rarely in order to have the
flexibility to use this for another future purpose.]</phrase></para>
</section> </section>
<section xml:id="dbdoclet.50655240_15141"> <section xml:id="dbdoclet.50655240_15141">
<title>Optional Save Areas</title> <title>Optional Save Areas</title>
@ -5252,7 +5336,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
<para>Functions without a suitable declaration available to the <para>Functions without a suitable declaration available to the
caller to determine the called function's characteristics (for caller to determine the called function's characteristics (for
example, functions in C without a prototype in scope, in accordance example, functions in C without a prototype in scope, in accordance
with Brian Kernighan and Dennis Ritche, with Brian Kernighan and Dennis <phrase
revisionflag="changed">Ritchie</phrase>,
<citetitle>The C Programming Language</citetitle>, 1st <citetitle>The C Programming Language</citetitle>, 1st
edition).</para> edition).</para>
</listitem> </listitem>
@ -6220,6 +6305,16 @@ ld r12, 0(r12)


ld r12, symbol2@got(r2) ld r12, symbol2@got(r2)
lvx v1, 0, r12</programlisting> lvx v1, 0, r12</programlisting>
<itemizedlist>
<listitem>
<para revisionflag="added">
By using PC-relative addressing.
</para>
</listitem>
</itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@pcrel(0), 1

plvx v1, symbol@pcrel(0), 1</programlisting>
<para>In the OpenPOWER ELF V2 ABI, position-dependent code built with <para>In the OpenPOWER ELF V2 ABI, position-dependent code built with
this addressing scheme may have a Global Offset Table (GOT) in the data this addressing scheme may have a Global Offset Table (GOT) in the data
segment that holds addresses. (For more information, see segment that holds addresses. (For more information, see
@ -6259,6 +6354,12 @@ lvx v1, 0, r12</programlisting>
loaded in the first 2 GB of the address space because direct address loaded in the first 2 GB of the address space because direct address
references and TOC-pointer initializations can be performed using a references and TOC-pointer initializations can be performed using a
two-instruction sequence.</para> two-instruction sequence.</para>
<para revisionflag="added">
PC-relative offsets are always 34 bits for all code models, with
a maximum addressing reach of 16GB. The effective addressing reach
for global data is 8GB, since data sections are always located at
higher virtual addresses than text sections.
</para>
</section> </section>
<section> <section>
<title>Position-Independent Code</title> <title>Position-Independent Code</title>
@ -6318,6 +6419,47 @@ ld r12, 0(r12)


ld r12 symbol2@got(r2) ld r12 symbol2@got(r2)
lvx v1, 0, r12</programlisting> lvx v1, 0, r12</programlisting>
<itemizedlist>
<listitem>
<para revisionflag="added">By using PC-relative addressing (for
private data).</para>
</listitem>
</itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@pcrel(0), 1

plvx v1, symbol@pcrel(0), 1</programlisting>
<itemizedlist>
<listitem>
<para revisionflag="added">By using PC-relative GOT-indirect
addressing (for shared data or very large span from code to data):
</para>
</listitem>
</itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@got@pcrel(0), 1
ld r12, 0(r12)

pld r12, symbol@got@pcrel(0), 1
lvx v1, 0, r12</programlisting>
<para revisionflag="added">
A compiler may generate a PC-relative addressing sequence to access
static or restricted-visibility data, but must generate a PC-relative
GOT-indirect sequence for extern data. Extern data may be satisfied
from a statically or dynamically linked source, so the compiler must
be conservative. The compiler and linker can cooperate to replace a
PC-relative GOT-indirect sequence with a PC-relative sequence when
the data reference is satisfied at static link time. See
<xref linkend="dbdoclet.50655241_OptPCRel" />.
</para>
<para revisionflag="added">[To discuss: I'd like to see the assembler
support "pld r12, symbol@pcrel" as an alternative to "pld r12,
symbol@pcrel(0), 1", and "pld r12, symbol@got@pcrel" as an
alternative to "pld r12, symbol@got@pcrel(0), 1". In general, any
prefix load/store with only two arguments is PC-relative; the
second argument is either a 34-bit offset or a GPR. Is this
reasonable or too confusing? Another alternative would be "pld r12,
symbol@pcrel(cia)" for an offset, and "pld r12, r5, cia" for the
GPR case. I guess we want something readable that isn't too
complex for the assembler to sort out.]</para>
<para>Position-independent executables or shared objects have a GOT in <para>Position-independent executables or shared objects have a GOT in
the data segment that holds addresses. When the system creates a memory the data segment that holds addresses. When the system creates a memory
image from the file, the GOT entries are updated to reflect the image from the file, the GOT entries are updated to reflect the
@ -6335,6 +6477,8 @@ lvx v1, 0, r12</programlisting>
</section> </section>
<section xml:id="dbdoclet.50655240_19143"> <section xml:id="dbdoclet.50655240_19143">
<title>Code Models</title> <title>Code Models</title>
<bridgehead revisionflag="added">TOC-Based Compilation
Units</bridgehead>
<para>Compilers may provide different code models depending on the <para>Compilers may provide different code models depending on the
expected size of the TOC and the size of the entire executable or expected size of the TOC and the size of the entire executable or
shared library.</para> shared library.</para>
@ -6359,7 +6503,8 @@ lvx v1, 0, r12</programlisting>
addition, accesses to module-local code and data objects use TOC addition, accesses to module-local code and data objects use TOC
pointer relative addressing with 32-bit offsets. Using TOC pointer pointer relative addressing with 32-bit offsets. Using TOC pointer
relative addressing removes a level of indirection, resulting in relative addressing removes a level of indirection, resulting in
faster access and a smaller GOT. However. it limits the size of the faster access and a smaller GOT. <phrase
revisionflag="changed">However,</phrase> it limits the size of the
entire binary to between 2 GB and 4 GB, depending on the placement entire binary to between 2 GB and 4 GB, depending on the placement
of the TOC base.</para> of the TOC base.</para>
<note> <note>
@ -6379,6 +6524,53 @@ lvx v1, 0, r12</programlisting>
TOCs, or by some other method. The suggested allocation order of TOCs, or by some other method. The suggested allocation order of
sections is provided in sections is provided in
<xref linkend="dbdoclet.50655241_66700" />.</para> <xref linkend="dbdoclet.50655241_66700" />.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
Compilers may provide different code models depending on the size of
the entire executable or shared library. There is no small code
model for PC-relative compilation units.
</para>
<itemizedlist revisionflag="added">
<listitem>
<para>
Medium code model: Accesses to module-local code and data objects
use PC-relative addressing with 34-bit offsets.
Position-independent code uses PC-relative GOT-indirect
addressing to access other objects in the binary.
</para>
</listitem>
<listitem>
<para>
Large code model: Used when 34-bit offsets are insufficient to
reach global data or the GOT from at least one text section,
this is similar to the medium code model, except that up to
64-bit PC-relative offsets are used by generating them into a
register. [To discuss: None of the options for this seem ideal.
It takes about 5 instructions to generate a 64-bit constant into
a register, though we can perhaps use linker optimizations to
replace with a smaller sequence when available. A second choice
is to place the offset in a .quad in the text section to reach
the .got entry, but this would incur a load-load dependency.
(Are there cases where this requires a text relocation resolution
during dynamic linking?) A third choice is to fail the compile
and require TOC addressing with large code model when 34-bit
offsets aren't enough, though that doesn't initially seem
reasonable. Whatever we choose, we should document the sequence
and any associated linker optimizations.]
</para>
</listitem>
</itemizedlist>
<para revisionflag="added">
As with TOC-based compilation units, the medium code model is the
default for compilers, and is applicable to most programs and
libraries. The code examples in this document generally use the
medium code model.
</para>
<para revisionflag="added">
When linking PC-relative relocatable objects, the linker should
attempt to place the .got section near the text sections.
</para>
</section> </section>
</section> </section>
<section xml:id="dbdoclet.50655240_12107"> <section xml:id="dbdoclet.50655240_12107">
@ -6387,9 +6579,50 @@ lvx v1, 0, r12</programlisting>
section.</para> section.</para>
<section xml:id="dbdoclet.50655240___RefHeading___Toc377640597"> <section xml:id="dbdoclet.50655240___RefHeading___Toc377640597">
<title>Function Prologue</title> <title>Function Prologue</title>
<para>A function's prologue establishes addressability by initializing <para revisionflag="added">The function prologue is responsible for
a TOC pointer in register r2, if necessary, and a stack frame, if the following functions:</para>
necessary, and may save any nonvolatile registers it uses.</para> <itemizedlist revisionflag="added">
<listitem>
<para>Establishing addressability to global data</para>
</listitem>
<listitem>
<para>Creating a stack frame when required</para>
</listitem>
<listitem>
<para>Saving any nonvolatile registers that are used by the
function</para>
</listitem>
<listitem>
<para>Saving any limited-access bits that are used by the function,
per the rules described in <xref
linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para>
</listitem>
</itemizedlist>
<para revisionflag="added">This ABI shall be used in conjunction with
the Power Architecture that implements the
<emphasis role="bold">mfocrf</emphasis> architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined
bits in a manner to allow the combination of multiple
<emphasis role="bold">mfocrf</emphasis> results with an OR instruction;
for example, to yield a word in r0 including all three preserved CRs as
follows:</para>
<programlisting revisionflag="added">mfocrf r0, crf2
mfocrf r1, crf3
or r0, r0, r1
mfocrf r1, crf4
or r0, r0, r1</programlisting>
<para revisionflag="added">Specifically, this allows each
OpenPOWER-compliant processor implementation to set each field to hold
either 0 or the correct in-order value of the corresponding CR field at
the point where the <emphasis role="bold">mfocrf</emphasis>
instruction is performed.</para>
<bridgehead revisionflag="added">TOC-Based Compilation
Units</bridgehead>
<para><phrase revisionflag="changed">In a TOC-based compilation unit,
a</phrase> function's prologue establishes addressability by
initializing a TOC pointer in register r2, if necessary, and a stack
frame, if necessary, and may save any nonvolatile registers it
uses.</para>
<para>All functions have a global entry point (GEP) available to any <para>All functions have a global entry point (GEP) available to any
caller and pointing to the beginning of the prologue. Some functions caller and pointing to the beginning of the prologue. Some functions
may have a secondary entry point to optimize the cost of TOC pointer may have a secondary entry point to optimize the cost of TOC pointer
@ -6420,9 +6653,10 @@ addi r2, r2, .TOC.-func@l</programlisting>
form that is faster due to instruction fusion, such as:</para> form that is faster due to instruction fusion, such as:</para>
<programlisting>lis r2, .TOC.@ha <programlisting>lis r2, .TOC.@ha
addi r2, r2, .TOC.@l</programlisting> addi r2, r2, .TOC.@l</programlisting>
<para>In addition to establishing addressability, the function prologue <para revisionflag="deleted">In addition to establishing
addressability, the function prologue
is responsible for the following functions:</para> is responsible for the following functions:</para>
<itemizedlist> <itemizedlist revisionflag="deleted">
<listitem> <listitem>
<para>Creating a stack frame when required</para> <para>Creating a stack frame when required</para>
</listitem> </listitem>
@ -6436,24 +6670,25 @@ addi r2, r2, .TOC.@l</programlisting>
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para> <xref linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>This ABI shall be used in conjunction with the Power Architecture <para revisionflag="deleted">This ABI shall be used in conjunction with
that implements the the Power Architecture that implements the
<emphasis role="bold">mfocrf</emphasis> architecture level. Further, <emphasis role="bold">mfocrf</emphasis> architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined OpenPOWER-compliant processors shall implement implementation-defined
bits in a manner to allow the combination of multiple bits in a manner to allow the combination of multiple
<emphasis role="bold">mfocrf</emphasis> results with an OR instruction; for example, <emphasis role="bold">mfocrf</emphasis> results with an OR instruction; for example,
to yield a word in r0 including all three preserved CRs as to yield a word in r0 including all three preserved CRs as
follows:</para> follows:</para>
<programlisting>mfocrf r0, crf2 <programlisting revisionflag="deleted">mfocrf r0, crf2
mfocrf r1, crf3 mfocrf r1, crf3
or r0, r0, r1 or r0, r0, r1
mfocrf r1, crf4 mfocrf r1, crf4
or r0, r0, r1</programlisting> or r0, r0, r1</programlisting>
<para>Specifically, this allows each OpenPOWER-compliant processor <para revisionflag="deleted">Specifically, this allows each
implementation to set each field to hold either 0 or the correct OpenPOWER-compliant processor implementation to set each field to hold
in-order value of the corresponding CR field at the point where the either 0 or the correct in-order value of the corresponding CR field at
<emphasis role="bold">mfocrf</emphasis> instruction is performed.</para> the point where the <emphasis role="bold">mfocrf</emphasis>
<para>&#160;</para> instruction is performed.</para>
<para revisionflag="deleted">&#160;</para>
<bridgehead>Assembly Language Syntax for Defining Entry <bridgehead>Assembly Language Syntax for Defining Entry
Points</bridgehead> Points</bridgehead>
<para>When a function has two entry points, the global entry point is <para>When a function has two entry points, the global entry point is
@ -6472,6 +6707,14 @@ or r0, r0, r1</programlisting>
the meaning of the second parameter, which is put in the three the meaning of the second parameter, which is put in the three
most-significant bits of the st_other field in the ELF Symbol Table most-significant bits of the st_other field in the ELF Symbol Table
entry.</para> entry.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
In a PC-relative compilation unit, the function prologue does not
require any setup code to establish addressability to global data.
Therefore there is also no need for a function to have a separate
local entry point.
</para>
</section> </section>
<section xml:id="dbdoclet.50655240_13754"> <section xml:id="dbdoclet.50655240_13754">
<title>Function Epilogue</title> <title>Function Epilogue</title>
@ -6884,11 +7127,13 @@ _restvr_31: addi r12,r0,-16
<xref linkend="dbdoclet.50655242_page119" /> shows an example of this <xref linkend="dbdoclet.50655242_page119" /> shows an example of this
method.</para> method.</para>
<para>Examples of absolute and position-independent compilations are <para>Examples of absolute and position-independent compilations are
shown in shown in <phrase revisionflag="changed"><xref
<xref linkend="dbdoclet.50655240_12719" />, linkend="dbdoclet.50655240_12719" />,
<xref linkend="dbdoclet.50655240_page77" />, and <xref linkend="dbdoclet.50655240_page77" />,
<xref linkend="dbdoclet.50655240_19926" />. These examples show the C <xref linkend="dbdoclet.50655240_19926" />, and
language statements together with the generated assembly language. The <xref linkend="dbdoclet.50655240_StaticPCRel" /></phrase>. These
examples show the
C language statements together with the generated assembly language. The
assumption for these figures is that only executables can use absolute assumption for these figures is that only executables can use absolute
addressing while shared objects must use position-independent code addressing while shared objects must use position-independent code
addressing. The figures are intended to demonstrate the compilation of addressing. The figures are intended to demonstrate the compilation of
@ -7151,6 +7396,60 @@ stw r0,0,(r7)</programlisting>
</tbody> </tbody>
</tgroup> </tgroup>
</table> </table>

<table frame="all" pgwide="1" xml:id="dbdoclet.50655240_StaticPCRel"
revisionflag="added">
<title>PC-Relative Load and Store</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="30*" />
<colspec colname="c2" colwidth="70*" />
<thead>
<row>
<entry>
<para>
<emphasis role="bold">C Code</emphasis>
</para>
</entry>
<entry>
<para>
<emphasis role="bold">Assembly Code</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<programlisting>extern int src;
extern int dst;
int *ptr;

dst = src;

ptr = &amp;dst;

*ptr = src;


</programlisting>
</entry>
<entry>
<programlisting>.extern src
.extern dst
.extern ptr
.section ".text"
plwz r9, src@pcrel(0), 1
pstw r9, dst@pcrel(0), 1
paddi r11, 0, dst@pcrel, 1
pstd r11, ptr@pcrel(0), 1
pld r11, ptr@pcrel(0), 1
plwz r9, src@pcrel(0), 1
stw r9, 0(r11)</programlisting>
</entry>
</row>
</tbody>
</tgroup>
</table>
<note> <note>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
@ -7311,9 +7610,16 @@ nop</programlisting>
<xref linkend="dbdoclet.50655242_20388" />.</para> <xref linkend="dbdoclet.50655242_20388" />.</para>
</listitem> </listitem>
</orderedlist> </orderedlist>
<para revisionflag="added">
For a function call in a PC-relative compilation unit, the nop in
<xref linkend="dbdoclet.50655240_85319" /> should not be generated.
</para>
<para>For indirect function calls, the address of the function to be <para>For indirect function calls, the address of the function to be
called is placed in r12 and the CTR register. A bctrl instruction is used called is placed in r12 and the CTR register. A bctrl instruction is used
to perform the indirect branch as shown in to perform the indirect branch as shown in
<phrase revisionflag="added">
<xref linkend="dbdoclet.50655240_95364" />,
</phrase>
<xref linkend="dbdoclet.50655240_16744" />, and <xref linkend="dbdoclet.50655240_16744" />, and
<xref linkend="dbdoclet.50655240_95225" />. The ELF V2 ABI requires the <xref linkend="dbdoclet.50655240_95225" />. The ELF V2 ABI requires the
address of the called function to be in r12 when a cross-module function address of the called function to be in r12 when a cross-module function
@ -7381,7 +7687,11 @@ bctrl</programlisting>
</table --> </table -->
<para> <para>
<xref linkend="dbdoclet.50655240_16744" /> shows how to make an indirect <xref linkend="dbdoclet.50655240_16744" /> shows how to make an indirect
function call using small-model position-independent code.</para> function call using small-model position-independent code.
<phrase revisionflag="added">Note that the store and reload of the
TOC pointer r2 is not required in a PC-relative compilation
unit.</phrase>
</para>


<figure xml:id="dbdoclet.50655240_16744"> <figure xml:id="dbdoclet.50655240_16744">
<title>Small-Model Position-Independent Indirect Function Call</title> <title>Small-Model Position-Independent Indirect Function Call</title>
@ -7451,7 +7761,11 @@ ld r2,24(r1)</programlisting>
</table --> </table -->
<para> <para>
<xref linkend="dbdoclet.50655240_95225" /> shows how to make an indirect <xref linkend="dbdoclet.50655240_95225" /> shows how to make an indirect
function call using large-model position-independent code.</para> function call using large-model position-independent code.
<phrase revisionflag="added">Note that the store and reload of the
TOC pointer r2 is not required in a PC-relative compilation
unit.</phrase>
</para>


<figure xml:id="dbdoclet.50655240_95225"> <figure xml:id="dbdoclet.50655240_95225">
<title>Large-Model Position-Independent Indirect Function Call</title> <title>Large-Model Position-Independent Indirect Function Call</title>
@ -7521,6 +7835,7 @@ ld r2,24(r1)</programlisting>
</tbody> </tbody>
</tgroup> </tgroup>
</table --> </table -->
<bridgehead revisionflag="added">TOC-Based Compilation Units</bridgehead>
<para>Function calls need to be performed in conjunction with <para>Function calls need to be performed in conjunction with
establishing, maintaining, and restoring addressability through the TOC establishing, maintaining, and restoring addressability through the TOC
pointer register, r2. When a function is called, the TOC pointer register pointer register, r2. When a function is called, the TOC pointer register
@ -7553,6 +7868,19 @@ bl target
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" />, <xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" />,
<xref linkend="dbdoclet.50655241_95185" />, and <xref linkend="dbdoclet.50655241_95185" />, and
<xref linkend="dbdoclet.50655241_47572" />.</para> <xref linkend="dbdoclet.50655241_47572" />.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
As with TOC-based compilation units, for calls to functions resolved at
runtime, the linker must generate stub code to load the function
address from the PLT. When the stub code is generated on behalf of
an indirect call in a PC-relative compilation unit, the linker may
omit the save and restore of r2 from the stub code. This behavior
is optional but recommended. Calls in PC-relative code should not
be marked with the R_PPC64_TOCSAVE or R_PPC64_REL24_NOTOC relocations.
[To discuss: Do we need a relocation to identify this as a PC-relative
call?]
</para>
</section> </section>
<section xml:id="dbdoclet.50655240_47036"> <section xml:id="dbdoclet.50655240_47036">
<title>Branching</title> <title>Branching</title>
@ -7947,6 +8275,75 @@ f1:
.long .TOC. - Ldefault .long .TOC. - Ldefault
.long .TOC. - Lcase13</programlisting> .long .TOC. - Lcase13</programlisting>
</figure> </figure>
<para revisionflag="added">
<xref linkend="dbdoclet.50655240_PCRelSwitch" /> shows a switch
implementation for PC-relative compilation units. [TBD: This needs to
be a figure, not a table, which may require working with Annette and
FrameMaker to get something that looks similar to the other figures.
All we have in the document for the other figures is .png files from
the old FrameMaker version. Or maybe we should just convert all the
other figures to tables.]
</para>
<table frame="all" pgwide="1" xml:id="dbdoclet.50655240_PCRelSwitch"
revisionflag="added">
<title>
Position-Independent Switch Code (PC-Relative Addressing)
</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="30*" />
<colspec colname="c2" colwidth="70*" />
<thead>
<row>
<entry>
<para>
<emphasis role="bold">C Code</emphasis>
</para>
</entry>
<entry>
<para>
<emphasis role="bold">Assembly Code</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<programlisting>switch(j)
{
case 0:
...
case 1:
...
case 3:
...
default:
...
}


</programlisting>
</entry>
<entry>
<programlisting> cmplwi r12, 4
bge .Ldefault
slwi r12, 2
paddi r10, r0, .Ltab@pcrel, 1
lwax r8, r10, r12
add r10, r8, r10
mtctr r10
bctr
.p2align 2
.Ltab:
.word (.Lcase0-.Ltab)
.word (.Lcase1-.Ltab)
.word (.Ldefault-.Ltab)
.word (.Lcase3-.Ltab)</programlisting>
</entry>
</row>
</tbody>
</tgroup>
</table>
</section> </section>
<section xml:id="dbdoclet.50655240_32686"> <section xml:id="dbdoclet.50655240_32686">
<title>Dynamic Stack Space Allocation</title> <title>Dynamic Stack Space Allocation</title>
@ -8019,6 +8416,11 @@ addi r3,r1,p ; R3 = new data area following parameter save area.</pro
a value that needs to be preserved. In the future, if it is defined and a value that needs to be preserved. In the future, if it is defined and
if the function uses the Reserved word, the LR save doubleword must also if the function uses the Reserved word, the LR save doubleword must also
be copied.</para> be copied.</para>
<para revisionflag="added">
It is unnecessary to copy the TOC pointer doubleword for a
PC-relative compilation unit. [To discuss: Should we, for future
use of this slot for another purpose?]
</para>
<note> <note>
<para>Additional instructions will be necessary for an allocation of <para>Additional instructions will be necessary for an allocation of
variable size. If a dynamic deallocation will occur, the r1 stack variable size. If a dynamic deallocation will occur, the r1 stack

@ -245,7 +245,9 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.</progra
</section> </section>
<section xml:id="dbdoclet.50655241_66700"> <section xml:id="dbdoclet.50655241_66700">
<title>TOC</title> <title>TOC</title>
<para>The TOC is part of the data segment of an executable program.</para> <para>The TOC is part of the data segment of an executable program
<phrase revisionflag="added">built from at least one TOC-based object
file</phrase>.</para>
<para>This section describes a common layout of the TOC in an executable <para>This section describes a common layout of the TOC in an executable
file or shared object. Particular tools are not required to follow the file or shared object. Particular tools are not required to follow the
layout specified here.</para> layout specified here.</para>
@ -280,19 +282,21 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.</progra
instruction of the two instruction form with a nop and rewriting the second instruction of the two instruction form with a nop and rewriting the second
instruction. Consequently, the TOC pointer must be live during the first instruction. Consequently, the TOC pointer must be live during the first
and second instruction of a two-instruction reference.)</para> and second instruction of a two-instruction reference.)</para>
<para>&#160;</para> <para revisionflag="deleted">&#160;</para>
<bridgehead>Modules Containing Multiple TOCs</bridgehead> <section>
<para>The link editor may create multiple TOCs. In such a case, the <title revisionflag="changed">Modules Containing Multiple TOCs</title>
constituent .got, .toc, .sdata, and .sbss sections are conceptually <para>The link editor may create multiple TOCs. In such a case, the
repeated as necessary, with each TOC typically using a TOC pointer value constituent .got, .toc, .sdata, and .sbss sections are conceptually
of its base plus 0x8000. Any constituent section of type SHT_NOBITS in repeated as necessary, with each TOC typically using a TOC pointer value
any TOC but the last is converted to type SHT_PROGBITS filled with of its base plus 0x8000. Any constituent section of type SHT_NOBITS in
zeros.</para> any TOC but the last is converted to type SHT_PROGBITS filled with
<para>When multiple TOCs are present, linking must take care to save, zeros.</para>
initialize, and restore TOC pointers within a single module when calling <para>When multiple TOCs are present, linking must take care to save,
from one function to a second function using a different TOC pointer initialize, and restore TOC pointers within a single module when calling
value. Many of the same issues associated with a cross-module call apply from one function to a second function using a different TOC pointer
also to calls within a module but using different TOC pointers.</para> value. Many of the same issues associated with a cross-module call apply
also to calls within a module but using different TOC pointers.</para>
</section>
</section> </section>
<section xml:id="dbdoclet.50655241_73385"> <section xml:id="dbdoclet.50655241_73385">
<title>Symbol Table</title> <title>Symbol Table</title>
@ -302,7 +306,9 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.</progra
resolved dynamically by an associated shared object will have a symbol resolved dynamically by an associated shared object will have a symbol
table entry for that symbol. This entry will identify the symbol as table entry for that symbol. This entry will identify the symbol as
undefined by setting the st_shndx member to SHN_UNDEF.</para> undefined by setting the st_shndx member to SHN_UNDEF.</para>
<para>The OpenPOWER ABI uses the three most-significant bits in the <para><phrase revisionflag="added">For TOC-based compilation
units,</phrase> <phrase revisionflag="changed">the</phrase> OpenPOWER
ABI uses the three most-significant bits in the
symbol st_other field to specify the number of instructions between a symbol st_other field to specify the number of instructions between a
function's global entry point and local entry point. The global entry function's global entry point and local entry point. The global entry
point is used when it is necessary to set up the TOC pointer (r2) for the point is used when it is necessary to set up the TOC pointer (r2) for the
@ -2115,10 +2121,273 @@ my_func:
</tbody> </tbody>
</tgroup> </tgroup>
</informaltable> </informaltable>
<para revisionflag="added">
In the following figure, prefix34 specifies a 34-bit field split
between bits 14-31 and 48-63 of a doubleword. The other bits
remain unchanged. This is used by PC-relative load and store
instructions.
</para>
<informaltable frame="all" rowsep="0" colsep="0" revisionflag="added">
<tgroup cols="5">
<colspec colname="c1" colwidth="7*" />
<colspec colname="c2" colwidth="7*" />
<colspec colname="c3" colwidth="2*" />
<colspec colname="c4" colwidth="8*" />
<colspec colname="c5" colwidth="8*" />
<tbody>
<row>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry align="left">
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c5" namest="c3" align="center">
<para>prefix34</para>
</entry>
</row>
<row rowsep="1">
<entry align="left">
<para>0</para>
</entry>
<entry align="right" colsep="1">
<para>13</para>
</entry>
<entry align="left">
<para>14</para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right">
<para>31</para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c5" namest="c4" align="center">
<para>prefix34 (continued)</para>
</entry>
</row>
<row>
<entry align="left">
<para>32</para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para>47</para>
</entry>
<entry align="left">
<para>48</para>
</entry>
<entry align="right">
<para>63</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para revisionflag="added">
In the following figure, prefix34ds is similar to prefix34, but is
really just 32 bits because the two least-significant bits must be
zero and are not really part of the field. This is used, for example,
by the pldu instruction. In addition to the use of this relocation
field with the DS forms, prefix34ds relocations are also used in
conjunction with DQ forms, such as the plq instruction. In those
instances, the linker and assembler collaborate to create valid DQ
forms. They raise an error if the specified offset does not meet the
constraints of a valid DQ instruction form displacement.
</para>
<informaltable frame="all" rowsep="0" colsep="0" revisionflag="added">
<tgroup cols="7">
<colspec colname="c1" colwidth="7*" />
<colspec colname="c2" colwidth="7*" />
<colspec colname="c3" colwidth="2*" />
<colspec colname="c4" colwidth="7*" />
<colspec colname="c5" colwidth="7*" />
<colspec colname="c6" colwidth="1*" />
<colspec colname="c7" colwidth="1*" />
<tbody>
<row>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry align="left">
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c7" namest="c3" align="center">
<para>prefix34ds</para>
</entry>
</row>
<row rowsep="1">
<entry align="left">
<para>0</para>
</entry>
<entry align="right" colsep="1">
<para>13</para>
</entry>
<entry align="left">
<para>14</para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right">
<para>31</para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c5" namest="c4" align="center" colsep="1">
<para>prefix34ds (continued)</para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry align="left">
<para>32</para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para>47</para>
</entry>
<entry align="left">
<para>48</para>
</entry>
<entry align="right" colsep="1">
<para>61</para>
</entry>
<entry align="left">
<para>62</para>
</entry>
<entry align="right">
<para>63</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</section> </section>
<section xml:id="dbdoclet.50655241_51269"> <section xml:id="dbdoclet.50655241_51269">
<title>Relocation Notations</title> <title>Relocation Notations</title>
<para>The following notations are used in the relocation table.</para> <para>The following notations are used in the relocation table.</para>
<para revisionflag="added">
[There seem to be a number of missing notations in this table. We
have #higher[a], #highest[a], and got, and perhaps the @ notation
could use further description. Also, there is some usage of #high and
#higha instead of #hi and #ha, which I assume is a mistake.]
</para>
<para> </para> <para> </para>
<informaltable frame="none" rowsep="0" colsep="0"> <informaltable frame="none" rowsep="0" colsep="0">
<tgroup cols="2"> <tgroup cols="2">
@ -2350,6 +2619,15 @@ my_func:
<para>tp + tprel = (S + A)</para> <para>tp + tprel = (S + A)</para>
</entry> </entry>
</row> </row>
<row revisionflag="added">
<entry>
<para>pcrel</para>
</entry>
<entry>
<para>Represents the offset of the symbol being relocated
relative to the current instruction address.</para>
</entry>
</row>
<row> <row>
<entry> <entry>
<para>tlsgd</para> <para>tlsgd</para>
@ -4143,9 +4421,84 @@ my_func:
<para> </para> <para> </para>
</entry> </entry>
</row> </row>
<row revisionflag="added">
<entry>
<para>R_PPC64_PCREL34</para>
</entry>
<entry>
<para>256?</para>
</entry>
<entry>
<para>prefix34</para>
</entry>
<entry>
<para>@pcrel</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_PCREL34_DS</para>
</entry>
<entry>
<para>257?</para>
</entry>
<entry>
<para>prefix34ds*</para>
</entry>
<entry>
<para>@pcrel >> 2</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_GOT_PCREL34</para>
</entry>
<entry>
<para>258?</para>
</entry>
<entry>
<para>prefix34</para>
</entry>
<entry>
<para>@got@pcrel</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_GOT_PCREL34_DS</para>
</entry>
<entry>
<para>259?</para>
</entry>
<entry>
<para>prefix34ds*</para>
</entry>
<entry>
<para>@got@pcrel >> 2</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_PCREL_OPT</para>
</entry>
<entry>
<para>260?</para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
</tbody> </tbody>
</tgroup> </tgroup>
</table> </table>
<para revisionflag="added">
[To discuss: Assuming we build up 64-bit PC-relative offsets into a
register using shifts/adds, we'll need the #lo, #ha, #higher[a],
#highest[a] relocs to be defined also.]
</para>
</section> </section>
<section xml:id="dbdoclet.50655241_90220"> <section xml:id="dbdoclet.50655241_90220">
<title>Relocation Descriptions</title> <title>Relocation Descriptions</title>
@ -4239,6 +4592,13 @@ my_func:
associated with a global entry point. See associated with a global entry point. See
<xref linkend="dbdoclet.50655241_95185" /> for discussion of its <xref linkend="dbdoclet.50655241_95185" /> for discussion of its
use.</para> use.</para>
<para revisionflag="added">R_PPC64_PCREL_OPT</para>
<para revisionflag="added">
This relocation type requests that the annotated load or store
instruction and its immediately preceding instruction be optimized by
the linker when the referenced symbol can be statically resolved.
See <xref linkend="dbdoclet.50655241_OptPCRel" /> for details.
</para>
</section> </section>
<section> <section>
<title>Assembler Syntax</title> <title>Assembler Syntax</title>
@ -4301,10 +4661,14 @@ addi 2,2,.TOC.-func@l</programlisting>
requirements as indicated in this section.</para> requirements as indicated in this section.</para>
<section xml:id="dbdoclet.50655241_69294"> <section xml:id="dbdoclet.50655241_69294">
<title>Function Call</title> <title>Function Call</title>
<para>The static linker must modify a nop instruction after a bl function <para><phrase revisionflag="added">For TOC-based compilation
units,</phrase> <phrase revisionflag="changed">the</phrase>
static linker must modify a nop instruction after a bl function
call to restore the TOC pointer in r2 from 24(r1) when an external symbol call to restore the TOC pointer in r2 from 24(r1) when an external symbol
that may use the TOC may be called, as in that may use the TOC may be called, as in
<xref linkend="dbdoclet.50655240_88555" />. Object files must contain a <xref linkend="dbdoclet.50655240_88555" />.
<phrase revisionflag="added">TOC-based</phrase>
<phrase revisionflag="changed">object</phrase> files must contain a
nop slot after a bl instruction to an external symbol.</para> nop slot after a bl instruction to an external symbol.</para>
</section> </section>
<section> <section>
@ -4375,6 +4739,46 @@ target:
rewrite address references created using GOT-indirect loads and bl+4 rewrite address references created using GOT-indirect loads and bl+4
sequences to use TOC-relative address computation.</para> sequences to use TOC-relative address computation.</para>
</section> </section>
<section xml:id="dbdoclet.50655241_OptPCRel" revisionflag="added">
<title>Displacement Optimization for PC-Relative Accesses</title>
<para>
Compilers and assembly programmers must assume that references to
extern data having unrestricted visibility may be satisfied by a
dynamically linked object, and must therefore use PC-relative
GOT-indirect addressing for such references. A linker may
determine that such a reference is satisfied during static linking
and replace the reference with direct PC-relative addressing.
For example:
</para>
<programlisting>pld r12, symbol@got@pcrel(0), 1
lvx v1, 0, r12</programlisting>
<para>The previous sequence may be replaced by:</para>
<programlisting>nop
plvx v1, symbol@pcrel(0), 1</programlisting>
<para>
However, this optimization is not universally safe, since it
changes the value of r12 following the data reference. The
compiler or programmer must ensure that the value of r12 is not
subsequently used, and communicate a request for this optimization
by placing a RELOC_PPC64_PCREL_OPT on the second instruction in
the sequence. The compiler or programmer must further ensure that
the two instructions are not separated by intervening instructions.
</para>
<para>
[To discuss: This optimization is crucial for making PC-relative
performance good enough to replace TOC-relative addressing. I
thought about allowing the compiler to separate the two instructions,
and place an instruction-distance value in the
RELOC_PPC64_PCREL_OPT relocation field, but ultimately I think this
becomes difficult to implement, and I hope that the load-from-DSO
case is infrequent enough that the load-load dependency won't kill
us. Definitely need other opinions/ideas here.]
</para>
<para>
[To discuss: Can we add optimizations for PC-relative offsets built
for large code model? Only applies if we use shift/add sequences.]
</para>
</section>
</section> </section>
<section> <section>
@ -6979,7 +7383,9 @@ nop</programlisting>
<entry> <entry>
<para>One-bit field. This field is set to 1 if this function <para>One-bit field. This field is set to 1 if this function
does not have a TOC. For example, a stackless leaf assembly does not have a TOC. For example, a stackless leaf assembly
language routine with no references to external objects.</para> language routine with no references to external objects.
<phrase revisionflag="added">[To discuss: What value should be
set for PC-relative functions?]</phrase></para>
</entry> </entry>
</row> </row>
<row> <row>
@ -7147,6 +7553,15 @@ nop</programlisting>
parameters are placed in the Parameter Save Area.</para> parameters are placed in the Parameter Save Area.</para>
</entry> </entry>
</row> </row>
<row revisionflag="added">
<entry>
<para>???</para>
</entry>
<entry>
<para>[To discuss: Can/should we add a flag for PC-relative?]
</para>
</entry>
</row>
</tbody> </tbody>
</tgroup> </tgroup>
</informaltable> </informaltable>

@ -796,14 +796,18 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
code that contains any of the various R_PPC64_GOT* relocations or when code that contains any of the various R_PPC64_GOT* relocations or when
linking code that references the .TOC. address. The GOT consists of an linking code that references the .TOC. address. The GOT consists of an
8-byte header that contains the TOC base (the first TOC base when 8-byte header that contains the TOC base (the first TOC base when
multiple TOCs are present), followed by an array of 8-byte addresses. The multiple TOCs are present), followed by an array of 8-byte addresses.
link editor shall emit dynamic relocations as appropriate for each entry <phrase revisionflag="added">
in the GOT. At runtime, the dynamic linker will apply these relocations The 8-byte header value is undefined when all linked compilation units
after the addresses of all memory segments are known (and thus the are PC-relative.
addresses of all symbols). While the GOT may be appear to be an array of </phrase>
absolute addresses, this ABI does not preclude the GOT containing The link editor shall emit dynamic relocations as appropriate for each
nonaddress entries and specifies the presence of nonaddress tls_index entry in the GOT. At runtime, the dynamic linker will apply these
entries.</para> relocations after the addresses of all memory segments are known (and
thus the addresses of all symbols). While the GOT may be appear to be an
array of absolute addresses, this ABI does not preclude the GOT
containing nonaddress entries and specifies the presence of nonaddress
tls_index entries.</para>
<para>Absolute addresses are generated for all GOT relocations by the <para>Absolute addresses are generated for all GOT relocations by the
dynamic linker before giving control to general application code. dynamic linker before giving control to general application code.
(However, IFUNC resolution functions may be invoked before relocation is (However, IFUNC resolution functions may be invoked before relocation is
@ -812,7 +816,10 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
the executable or shared objects in a different process image. After the the executable or shared objects in a different process image. After the
initial mapping of the process image by the dynamic linker, memory initial mapping of the process image by the dynamic linker, memory
segments reside at fixed addresses for the life of a process.</para> segments reside at fixed addresses for the life of a process.</para>
<para>The symbol .TOC. may be used to access the GOT or in TOC-relative <para><phrase revisionflag="added">When at least one TOC-based
compilation unit is to be linked,</phrase>
<phrase revisionflag="changed">the</phrase>
symbol .TOC. may be used to access the GOT or in TOC-relative
addressing to other data constructs, such as the procedure linkage table. addressing to other data constructs, such as the procedure linkage table.
The symbol may be offset by 0x8000 bytes, or another offset, from the The symbol may be offset by 0x8000 bytes, or another offset, from the
start of the .got section. This offset allows the use of the full (64 KB) start of the .got section. This offset allows the use of the full (64 KB)
@ -826,8 +833,13 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
<para>In PIC code, the TOC pointer r2 points to the TOC base, enabling <para>In PIC code, the TOC pointer r2 points to the TOC base, enabling
easy reference. For static nonrelocatable modules, the GOT address is easy reference. For static nonrelocatable modules, the GOT address is
fixed and can be directly used by code.</para> fixed and can be directly used by code.</para>
<para>All functions except leaf routines must load the value of the TOC <para>All functions <phrase revisionflag="added">in TOC-based
base into the TOC register r2.</para> compilation units</phrase> except leaf routines must load the value of
the TOC base into the TOC register r2.</para>
<para revisionflag="added">
Functions in PC-relative compilation units access GOT entries directly
using PC-relative addressing.
</para>
</section> </section>
<section> <section>
<title>Function Addresses</title> <title>Function Addresses</title>
@ -980,12 +992,19 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
bl target bl target
.reloc ., R_PPC64_TOCSAVE, tocsaveloc .reloc ., R_PPC64_TOCSAVE, tocsaveloc
nop</programlisting> nop</programlisting>
<orderedlist> <orderedlist continuation="continues">
<listitem> <listitem>
<para>3. The caller has not set up r2 to hold the TOC pointer. This <para>The caller has not set up r2 to hold the TOC pointer. This
is indicated by use of a R_PPC64_REL24_NOTOC relocation (instead of is indicated by use of a R_PPC64_REL24_NOTOC relocation (instead of
R_PPC64_REL24) on the call instruction.</para> R_PPC64_REL24) on the call instruction.</para>
</listitem> </listitem>
<listitem revisionflag="added">
<para>
The caller is PC-relative and does not need to save the TOC
pointer. [To discuss: Do we need a relocation, or will we have
a module-level bit the linker can detect?]
</para>
</listitem>
</orderedlist> </orderedlist>
<para>In any scenario, the PLT call stub must transfer control to the <para>In any scenario, the PLT call stub must transfer control to the
function whose address is provided in the associated PLT entry. This function whose address is provided in the associated PLT entry. This
@ -1033,6 +1052,15 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
ld r12,func@plt@l(r12) ld r12,func@plt@l(r12)
mtctr r12 mtctr r12
bctr</programlisting> bctr</programlisting>
<para revisionflag="added">
A possible implementation for case 4 looks as follows:
</para>
<programlisting>pld r12, func@plt@got@pcrel(0), 1
mtctr r12
bctr</programlisting>
<para revisionflag="added">
[To discuss: Is that the right assembly syntax?]
</para>
<para>To support lazy binding, the link editor also provides a set of <para>To support lazy binding, the link editor also provides a set of
symbol resolver stubs, one for each PLT entry. Each resolver stub symbol resolver stubs, one for each PLT entry. Each resolver stub
consists of a single instruction, which is usually a branch to a common consists of a single instruction, which is usually a branch to a common
@ -1103,10 +1131,12 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
res_1: b PLTresolve res_1: b PLTresolve
...</programlisting> ...</programlisting>
<para>After resolution, the value of a PLT entry in the PLT is the <para>After resolution, the value of a PLT entry in the PLT is the
address of the functions global entry point, unless the resolver can address of the functions global entry point, unless the resolver
determine that a module-local call occurs with a shared TOC value wherein can determine that a module-local call occurs with a shared TOC value
the TOC is shared between the caller and the callee.</para> wherein the TOC is shared between the caller and the
<para> </para> <phrase revisionflag="changed">callee,</phrase>
<phrase revisionflag="added">or a module-local call occurs in a
PC-relative compilation unit. [?]</phrase></para>
</section> </section>
</section> </section>
</section> </section>

Loading…
Cancel
Save