|
|
@ -78,13 +78,17 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
|
|
|
|
languages), these data types may be accessed based on the type
|
|
|
|
languages), these data types may be accessed based on the type
|
|
|
|
names listed in <xref linkend="VIPR.biendian.vectypes" /> when
|
|
|
|
names listed in <xref linkend="VIPR.biendian.vectypes" /> when
|
|
|
|
Power ISA SIMD language extensions are enabled using either the
|
|
|
|
Power ISA SIMD language extensions are enabled using either the
|
|
|
|
<code>vector</code> or <code>__vector</code> keywords. NOTE
|
|
|
|
<code>vector</code> or <code>__vector</code> keywords. [FIXME:
|
|
|
|
THAT THIS IS THE FIRST TIME WE'VE MENTIONED THESE LANGUAGE
|
|
|
|
We haven't talked about these at all. Need to borrow some
|
|
|
|
EXTENSIONS, NEED TO FIX THAT.
|
|
|
|
description from the AltiVec PIM about the usage of vector,
|
|
|
|
|
|
|
|
bool, and pixel, and supplement with the problems this causes
|
|
|
|
|
|
|
|
with strict-ANSI C++. Maybe a separate section on "Language
|
|
|
|
|
|
|
|
Elements" should precede this one.]
|
|
|
|
</para>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
<para>
|
|
|
|
For the Fortran language, OH YET ANOTHER STINKING TABLE gives a
|
|
|
|
For the Fortran language, [FIXME: link to table in later
|
|
|
|
correspondence between Fortran and C/C++ language types.
|
|
|
|
section] gives a correspondence between Fortran and C/C++
|
|
|
|
|
|
|
|
language types.
|
|
|
|
</para>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
<para>
|
|
|
|
The assignment operator always performs a byte-by-byte data copy
|
|
|
|
The assignment operator always performs a byte-by-byte data copy
|
|
|
@ -413,9 +417,11 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
|
|
|
|
<title>Vector Operators</title>
|
|
|
|
<title>Vector Operators</title>
|
|
|
|
<para>
|
|
|
|
<para>
|
|
|
|
In addition to the dereference and assignment operators, the
|
|
|
|
In addition to the dereference and assignment operators, the
|
|
|
|
Power SIMD Vector Programming API (REALLY?) provides the usual
|
|
|
|
Power SIMD Vector Programming API [FIXME: If we're going to use
|
|
|
|
operators that are valid on pointers; these operators are also
|
|
|
|
a term like this, let's use it consistently; also, SIMD and
|
|
|
|
valid for pointers to vector types.
|
|
|
|
Vector are redundant] provides the usual operators that are
|
|
|
|
|
|
|
|
valid on pointers; these operators are also valid for pointers
|
|
|
|
|
|
|
|
to vector types.
|
|
|
|
</para>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
<para>
|
|
|
|
The traditional C/C++ operators are defined on vector types
|
|
|
|
The traditional C/C++ operators are defined on vector types
|
|
|
@ -452,6 +458,273 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
|
|
|
|
|
|
|
|
|
|
|
|
<section xml:id="VIPR.biendian.layout">
|
|
|
|
<section xml:id="VIPR.biendian.layout">
|
|
|
|
<title>Vector Layout and Element Numbering</title>
|
|
|
|
<title>Vector Layout and Element Numbering</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
Vector data types consist of a homogeneous sequence of elements
|
|
|
|
|
|
|
|
of the base data type specified in the vector data
|
|
|
|
|
|
|
|
type. Individual elements of a vector can be addressed by a
|
|
|
|
|
|
|
|
vector element number. Element numbers can be established either
|
|
|
|
|
|
|
|
by counting from the “left” of a register and assigning the
|
|
|
|
|
|
|
|
left-most element the element number 0, or from the “right” of
|
|
|
|
|
|
|
|
the register and assigning the right-most element the element
|
|
|
|
|
|
|
|
number 0.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
In big-endian environments, establishing element counts from the
|
|
|
|
|
|
|
|
left makes the element stored at the lowest memory address the
|
|
|
|
|
|
|
|
lowest-numbered element. Thus, when vectors and arrays of a
|
|
|
|
|
|
|
|
given base data type are overlaid, vector element 0 corresponds
|
|
|
|
|
|
|
|
to array element 0, vector element 1 corresponds to array
|
|
|
|
|
|
|
|
element 1, and so forth.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
In little-endian environments, establishing element counts from
|
|
|
|
|
|
|
|
the right makes the element stored at the lowest memory address
|
|
|
|
|
|
|
|
the lowest-numbered element. Thus, when vectors and arrays of a
|
|
|
|
|
|
|
|
given base data type are overlaid, vector element 0 will
|
|
|
|
|
|
|
|
correspond to array element 0, vector element 1 will correspond
|
|
|
|
|
|
|
|
to array element 1, and so forth.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
Consequently, the vector numbering schemes can be described as
|
|
|
|
|
|
|
|
big-endian and little-endian vector layouts and vector element
|
|
|
|
|
|
|
|
numberings.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
For internal consistency, in the ELF V2 ABI, the default vector
|
|
|
|
|
|
|
|
layout and vector element ordering in big-endian environments
|
|
|
|
|
|
|
|
shall be big endian, and the default vector layout and vector
|
|
|
|
|
|
|
|
element ordering in little-endian environments shall be little
|
|
|
|
|
|
|
|
endian. [FIXME: Here's a purported ABI requirement; should this
|
|
|
|
|
|
|
|
somehow remain part of the ABI document?]
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
This element numbering shall also be used by the <code>[]</code>
|
|
|
|
|
|
|
|
accessor method to vector elements provided as an extension of
|
|
|
|
|
|
|
|
the C/C++ languages by some compilers, as well as for other
|
|
|
|
|
|
|
|
language extensions or library constructs that directly or
|
|
|
|
|
|
|
|
indirectly refer to elements by their element number.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
Application programs may query the vector element ordering in
|
|
|
|
|
|
|
|
use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
|
|
|
|
|
|
|
|
has two possible values:
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<informaltable frame="none" rowsep="0" colsep="0">
|
|
|
|
|
|
|
|
<tgroup cols="2">
|
|
|
|
|
|
|
|
<colspec colname="c1" colwidth="40*" />
|
|
|
|
|
|
|
|
<colspec colname="c2" colwidth="60*" />
|
|
|
|
|
|
|
|
<tbody>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>__ORDER_LITTLE_ENDIAN__</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Vector elements use little-endian element ordering.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>__ORDER_BIG_ENDIAN__</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Vector elements use big-endian element ordering.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
</tbody>
|
|
|
|
|
|
|
|
</tgroup>
|
|
|
|
|
|
|
|
</informaltable>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|
|
|
|
<title>Vector Built-In Functions</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
Some of the POWER SIMD hardware instructions refer, implicitly
|
|
|
|
|
|
|
|
or explicitly, to vector element numbers. For example, the
|
|
|
|
|
|
|
|
<code>vspltb</code> instruction has as one of its inputs an
|
|
|
|
|
|
|
|
index into a vector. The element at that index position is to
|
|
|
|
|
|
|
|
be replicated in every element of the output vector. For
|
|
|
|
|
|
|
|
another example, <code>vmuleuh</code> instruction operates on
|
|
|
|
|
|
|
|
the even-numbered elements of its input vectors. The hardware
|
|
|
|
|
|
|
|
instructions define these element numbers using big-endian
|
|
|
|
|
|
|
|
element order, even when the machine is running in little-endian
|
|
|
|
|
|
|
|
mode. Thus, a built-in function that maps directly to the
|
|
|
|
|
|
|
|
underlying hardware instruction, regardless of the target
|
|
|
|
|
|
|
|
endianness, has the potential to confuse programmers on
|
|
|
|
|
|
|
|
little-endian platforms.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
It is more useful to define built-in functions that map to these
|
|
|
|
|
|
|
|
instructions to use natural element order. That is, the
|
|
|
|
|
|
|
|
explicit or implicit element numbers specified by such built-in
|
|
|
|
|
|
|
|
functions should be interpreted using big-endian element order
|
|
|
|
|
|
|
|
on a big-endian platform, and using little-endian element order
|
|
|
|
|
|
|
|
on a little-endian platform.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
The descriptions of the built-in functions in <xref
|
|
|
|
|
|
|
|
linkend="VIPR.vec-ref" /> contain notes on endian issues that
|
|
|
|
|
|
|
|
apply to each built-in function. Furthermore, a built-in
|
|
|
|
|
|
|
|
function requiring a different compiler implementation for
|
|
|
|
|
|
|
|
big-endian than it uses for little-endian has a sample
|
|
|
|
|
|
|
|
compiler implementation for both BE and LE. These sample
|
|
|
|
|
|
|
|
implementations are only intended as examples; designers of a
|
|
|
|
|
|
|
|
compiler are free to use other methods to implement the
|
|
|
|
|
|
|
|
specified semantics as they see fit.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|
|
|
|
<title>Extended Data Movement Functions</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
The built-in functions in <xref
|
|
|
|
|
|
|
|
linkend="VIPR.biendian.vmx-mem" /> map to Altivec/VMX load and
|
|
|
|
|
|
|
|
store instructions and provide access to the “auto-aligning”
|
|
|
|
|
|
|
|
memory instructions of the VMX ISA where low-order address
|
|
|
|
|
|
|
|
bits are discarded before performing a memory access. These
|
|
|
|
|
|
|
|
instructions access load and store data in accordance with the
|
|
|
|
|
|
|
|
program's current endian mode, and do not need to be adapted
|
|
|
|
|
|
|
|
by the compiler to reflect little-endian operating during code
|
|
|
|
|
|
|
|
generation.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
|
|
|
|
|
|
|
|
<title>VMX Memory Access Built-In Functions</title>
|
|
|
|
|
|
|
|
<tgroup cols="3">
|
|
|
|
|
|
|
|
<colspec colname="c1" colwidth="15*" align="center" />
|
|
|
|
|
|
|
|
<colspec colname="c2" colwidth="35*" align="center" />
|
|
|
|
|
|
|
|
<colspec colname="c3" colwidth="50*" />
|
|
|
|
|
|
|
|
<thead>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
<emphasis role="bold">Built-in Function</emphasis>
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
<emphasis role="bold">Corresponding POWER
|
|
|
|
|
|
|
|
Instructions</emphasis>
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry align="center">
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
<emphasis role="bold">Implementation Notes</emphasis>
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
</thead>
|
|
|
|
|
|
|
|
<tbody>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>vec_ld</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>lvx</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>vec_lde</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>lvebx, lvehx, lvewx</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>vec_ldl</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>lvxl</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>vec_st</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>stvx</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>vec_ste</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>stvebx, stvehx, stvewx</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>vec_stl</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>stvxl</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
<entry>
|
|
|
|
|
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
|
|
|
|
|
</entry>
|
|
|
|
|
|
|
|
</row>
|
|
|
|
|
|
|
|
</tbody>
|
|
|
|
|
|
|
|
</tgroup>
|
|
|
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
Previous versions of the VMX built-in functions defined
|
|
|
|
|
|
|
|
intrinsics to access the VMX instructions <code>lvsl</code>
|
|
|
|
|
|
|
|
and <code>lvsr</code>, which could be used in conjunction with
|
|
|
|
|
|
|
|
<code>vec_vperm</code> and VMX load and store instructions for
|
|
|
|
|
|
|
|
unaligned access. The <code>vec_lvsl</code> and
|
|
|
|
|
|
|
|
<code>vec_lvsr</code> interfaces are deprecated in accordance
|
|
|
|
|
|
|
|
with the interfaces specified here. For compatibility, the
|
|
|
|
|
|
|
|
built-in pseudo sequences published in previous VMX documents
|
|
|
|
|
|
|
|
continue to work with little-endian data layout and the
|
|
|
|
|
|
|
|
little-endian vector layout described in this
|
|
|
|
|
|
|
|
document. However, the use of these sequences in new code is
|
|
|
|
|
|
|
|
discouraged and usually results in worse performance. It is
|
|
|
|
|
|
|
|
recommended (but not required) that compilers issue a warning
|
|
|
|
|
|
|
|
when these functions are used in little-endian
|
|
|
|
|
|
|
|
environments. It is recommended that programmers use the
|
|
|
|
|
|
|
|
<code>vec_xl</code> and <code>vec_xst</code> vector built-in
|
|
|
|
|
|
|
|
functions to access unaligned data streams. See the
|
|
|
|
|
|
|
|
descriptions of these instructions in <xref
|
|
|
|
|
|
|
|
linkend="VIPR.vec-ref" /> for further description and
|
|
|
|
|
|
|
|
implementation details.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|
|
|
|
<title>Big-Endian Vector Layout in Little-Endian Environments
|
|
|
|
|
|
|
|
(Deprecated)</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
|
|
|
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
|
|
|
|
|
|
|
|
for POWER provided for optional compiler support for using
|
|
|
|
|
|
|
|
big-endian element ordering in little-endian environments.
|
|
|
|
|
|
|
|
This was initially deemed useful for porting certain libraries
|
|
|
|
|
|
|
|
that assumed big-endian element ordering regardless of the
|
|
|
|
|
|
|
|
endianness of their input streams. In practice, this
|
|
|
|
|
|
|
|
introduced serious compiler complexity without much utility.
|
|
|
|
|
|
|
|
Thus this support (previously controlled by switches
|
|
|
|
|
|
|
|
<code>-maltivec=be</code> and/or <code>-qaltivec=be</code>) is
|
|
|
|
|
|
|
|
now deprecated. Current versions of the gcc and clang
|
|
|
|
|
|
|
|
open-source compilers do not implement this support.
|
|
|
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|
|
|
|
<title>Language-Specific Vector Support for Other
|
|
|
|
|
|
|
|
Languages</title>
|
|
|
|
<para>
|
|
|
|
<para>
|
|
|
|
filler
|
|
|
|
filler
|
|
|
|
</para>
|
|
|
|
</para>
|
|
|
|