|
|
|
@ -18,11 +18,443 @@
@@ -18,11 +18,443 @@
|
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> |
|
|
|
|
|
|
|
|
|
<!-- Chapter Title goes here. --> |
|
|
|
|
<title>The Power Bi-Endian Programming Model</title> |
|
|
|
|
<title>The POWER Bi-Endian Vector Programming Model</title> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
To ensure portability of applications optimized to exploit the |
|
|
|
|
SIMD functions of POWER ISA processors, the ELF V2 ABI defines a |
|
|
|
|
set of functions and data types for SIMD programming. ELF |
|
|
|
|
V2-compliant compilers will provide suitable support for these |
|
|
|
|
functions, preferably as built-in functions that translate to one |
|
|
|
|
or more POWER ISA instructions. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Compilers are encouraged, but not required, to provide built-in |
|
|
|
|
functions to access individual instructions in the IBM POWER® |
|
|
|
|
instruction set architecture. In most cases, each such built-in |
|
|
|
|
function should provide direct access to the underlying |
|
|
|
|
instruction. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
However, to ease porting between little-endian (LE) and big-endian |
|
|
|
|
(BE) POWER systems, and between POWER and other platforms, it is |
|
|
|
|
preferable that some built-in functions provide the same semantics |
|
|
|
|
on both LE and BE POWER systems, even if this means that the |
|
|
|
|
built-in functions are implemented with different instruction |
|
|
|
|
sequences for LE and BE. To achieve this, vector built-in |
|
|
|
|
functions provide a set of functions derived from the set of |
|
|
|
|
hardware functions provided by the Power vector SIMD |
|
|
|
|
instructions. Unlike traditional “hardware intrinsic” built-in |
|
|
|
|
functions, no fixed mapping exists between these built-in |
|
|
|
|
functions and the generated hardware instruction sequence. Rather, |
|
|
|
|
the compiler is free to generate optimized instruction sequences |
|
|
|
|
that implement the semantics of the program specified by the |
|
|
|
|
programmer using these built-in functions. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
This is primarily applicable to the POWER SIMD instructions. As |
|
|
|
|
we've seen, this set of instructions operates on groups of 2, 4, |
|
|
|
|
8, or 16 vector elements at a time in 128-bit registers. On a |
|
|
|
|
big-endian POWER platform, vector elements are loaded from memory |
|
|
|
|
into a register so that the 0th element occupies the high-order |
|
|
|
|
bits of the register, and the (N – 1)th element occupies the |
|
|
|
|
low-order bits of the register. This is referred to as big-endian |
|
|
|
|
element order. On a little-endian POWER platform, vector elements |
|
|
|
|
are loaded from memory such that the 0th element occupies the |
|
|
|
|
low-order bits of the register, and the (N – 1)th element |
|
|
|
|
occupies the high-order bits. This is referred to as little-endian |
|
|
|
|
element order. |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
<section> |
|
|
|
|
<title>Purpose</title> |
|
|
|
|
<para>filler</para> |
|
|
|
|
<title>Vector Data Types</title> |
|
|
|
|
<para> |
|
|
|
|
Languages provide support for the data types in <xref |
|
|
|
|
linkend="VIPR.biendian.vectypes" /> to represent vector data |
|
|
|
|
types stored in vector registers. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
For the C and C++ programming languages (and related/derived |
|
|
|
|
languages), these data types may be accessed based on the type |
|
|
|
|
names listed in <xref linkend="VIPR.biendian.vectypes" /> when |
|
|
|
|
Power ISA SIMD language extensions are enabled using either the |
|
|
|
|
<code>vector</code> or <code>__vector</code> keywords. NOTE |
|
|
|
|
THAT THIS IS THE FIRST TIME WE'VE MENTIONED THESE LANGUAGE |
|
|
|
|
EXTENSIONS, NEED TO FIX THAT. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
For the Fortran language, OH YET ANOTHER STINKING TABLE gives a |
|
|
|
|
correspondence between Fortran and C/C++ language types. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
The assignment operator always performs a byte-by-byte data copy |
|
|
|
|
for vector data types. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Like other C/C++ language types, vector types may be defined to |
|
|
|
|
have const or volatile properties. Vector data types can be |
|
|
|
|
defined as being in static, auto, and register storage. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Pointers to vector types are defined like pointers of other |
|
|
|
|
C/C++ types. Pointers to vector objects may be defined to have |
|
|
|
|
const and volatile properties. Pointers to vector objects must |
|
|
|
|
be divisible by 16, as vector objects are always aligned on |
|
|
|
|
quadword (128-bit) boundaries. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
The preferred way to access vectors at an application-defined |
|
|
|
|
address is by using vector pointers and the C/C++ dereference |
|
|
|
|
operator <code>*</code>. Similar to other C/C++ data types, the |
|
|
|
|
array reference operator <code>[]</code> may be used to access |
|
|
|
|
vector objects with a vector pointer with the usual definition |
|
|
|
|
to access the <emphasis>n</emphasis>th vector element from a |
|
|
|
|
vector pointer. The dereference operator <code>*</code> may |
|
|
|
|
<emphasis>not</emphasis> be used to access data that is not |
|
|
|
|
aligned at least to a quadword boundary. Built-in functions |
|
|
|
|
such as <code>vec_xl</code> and <code>vec_xst</code> are |
|
|
|
|
provided for unaligned data access. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Compilers are expected to recognize and optimize multiple |
|
|
|
|
operations that can be optimized into a single hardware |
|
|
|
|
instruction. For example, a load and splat hardware instruction |
|
|
|
|
might be generated for the following sequence: |
|
|
|
|
</para> |
|
|
|
|
<programlisting>double *double_ptr; |
|
|
|
|
register vector double vd = vec_splats(*double_ptr);</programlisting> |
|
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes"> |
|
|
|
|
<title>Vector Types</title> |
|
|
|
|
<tgroup cols="4"> |
|
|
|
|
<colspec colname="c1" colwidth="20*" /> |
|
|
|
|
<colspec colname="c2" colwidth="10*" align="center" /> |
|
|
|
|
<colspec colname="c3" colwidth="15*" align="center" /> |
|
|
|
|
<colspec colname="c4" colwidth="40*" /> |
|
|
|
|
<thead> |
|
|
|
|
<row> |
|
|
|
|
<entry align="center"> |
|
|
|
|
<para> |
|
|
|
|
<emphasis role="bold">Power SIMD C Types</emphasis> |
|
|
|
|
</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry align="center"> |
|
|
|
|
<para> |
|
|
|
|
<emphasis role="bold">sizeof</emphasis> |
|
|
|
|
</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry align="center"> |
|
|
|
|
<para> |
|
|
|
|
<emphasis role="bold">Alignment</emphasis> |
|
|
|
|
</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry align="center"> |
|
|
|
|
<para> |
|
|
|
|
<emphasis role="bold">Description</emphasis> |
|
|
|
|
</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
</thead> |
|
|
|
|
<tbody> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector unsigned char</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 16 unsigned bytes.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector signed char</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 16 signed bytes.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector bool char</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 16 bytes with a value of either 0 or |
|
|
|
|
2<superscript>8</superscript> – 1.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector unsigned short</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 8 unsigned halfwords.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector signed short</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 8 signed halfwords.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector bool short</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 8 halfwords with a value of either 0 or |
|
|
|
|
2<superscript>16</superscript> – 1.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector unsigned int</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 4 unsigned words.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector signed int</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 4 signed words.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector bool int</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 4 words with a value of either 0 or |
|
|
|
|
2<superscript>32</superscript> – 1.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector unsigned long<footnote xml:id="vlong"> |
|
|
|
|
<para>The vector long types are deprecated due to their |
|
|
|
|
ambiguity between 32-bit and 64-bit environments. The use |
|
|
|
|
of the vector long long types is preferred.</para> |
|
|
|
|
</footnote></para> |
|
|
|
|
<para>vector unsigned long long</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 2 unsigned doublewords.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector signed long<footnoteref linkend="vlong" /></para> |
|
|
|
|
<para>vector signed long long</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 2 signed doublewords.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector bool long<footnoteref linkend="vlong" /></para> |
|
|
|
|
<para>vector bool long long</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 2 doublewords with a value of either 0 or |
|
|
|
|
2<superscript>64</superscript> – 1.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector unsigned __int128</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 1 unsigned quadword.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector signed __int128</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 1 signed quadword.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector _Float16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 8 half-precision floats.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector float</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 4 single-precision floats.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
<row> |
|
|
|
|
<entry> |
|
|
|
|
<para>vector double</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>16</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Quadword</para> |
|
|
|
|
</entry> |
|
|
|
|
<entry> |
|
|
|
|
<para>Vector of 2 double-precision floats.</para> |
|
|
|
|
</entry> |
|
|
|
|
</row> |
|
|
|
|
</tbody> |
|
|
|
|
</tgroup> |
|
|
|
|
</table> |
|
|
|
|
</section> |
|
|
|
|
|
|
|
|
|
<section> |
|
|
|
|
<title>Vector Operators</title> |
|
|
|
|
<para> |
|
|
|
|
In addition to the dereference and assignment operators, the |
|
|
|
|
Power SIMD Vector Programming API (REALLY?) provides the usual |
|
|
|
|
operators that are valid on pointers; these operators are also |
|
|
|
|
valid for pointers to vector types. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
The traditional C/C++ operators are defined on vector types |
|
|
|
|
with “do all” semantics for unary and binary <code>+</code>, |
|
|
|
|
unary and binary –, binary <code>*</code>, binary |
|
|
|
|
<code>%</code>, and binary <code>/</code> as well as the unary |
|
|
|
|
and binary shift, logical and comparison operators, and the |
|
|
|
|
ternary <code>?:</code> operator. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
For unary operators, the specified operation is performed on |
|
|
|
|
the corresponding base element of the single operand to derive |
|
|
|
|
the result value for each vector element of the vector |
|
|
|
|
result. The result type of unary operations is the type of the |
|
|
|
|
single input operand. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
For binary operators, the specified operation is performed on |
|
|
|
|
the corresponding base elements of both operands to derive the |
|
|
|
|
result value for each vector element of the vector |
|
|
|
|
result. Both operands of the binary operators must have the |
|
|
|
|
same vector type with the same base element type. The result |
|
|
|
|
of binary operators is the same type as the type of the input |
|
|
|
|
operands. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Further, the array reference operator may be applied to vector |
|
|
|
|
data types, yielding an l-value corresponding to the specified |
|
|
|
|
element in accordance with the vector element numbering rules (see |
|
|
|
|
<xref linkend="VIPR.biendian.layout" />). An l-value may either |
|
|
|
|
be assigned a new value or accessed for reading its value. |
|
|
|
|
</para> |
|
|
|
|
</section> |
|
|
|
|
|
|
|
|
|
<section xml:id="VIPR.biendian.layout"> |
|
|
|
|
<title>Vector Layout and Element Numbering</title> |
|
|
|
|
<para> |
|
|
|
|
filler |
|
|
|
|
</para> |
|
|
|
|
</section> |
|
|
|
|
|
|
|
|
|
<section> |
|
|
|
|