|
|
<!--
|
|
|
Copyright (c) 2019 OpenPOWER Foundation
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
you may not use this file except in compliance with the License.
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
See the License for the specific language governing permissions and
|
|
|
limitations under the License.
|
|
|
|
|
|
-->
|
|
|
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
|
|
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
|
<title>The POWER Bi-Endian Vector Programming Model</title>
|
|
|
|
|
|
<para>
|
|
|
To ensure portability of applications optimized to exploit the
|
|
|
SIMD functions of POWER ISA processors, this reference defines a
|
|
|
set of functions and data types for SIMD programming. Compliant
|
|
|
compilers will provide suitable support for these functions,
|
|
|
preferably as built-in functions that translate to one or more
|
|
|
POWER ISA instructions.
|
|
|
</para>
|
|
|
<para>
|
|
|
Compilers are encouraged, but not required, to provide built-in
|
|
|
functions to access individual instructions in the IBM POWER®
|
|
|
instruction set architecture. In most cases, each such built-in
|
|
|
function should provide direct access to the underlying
|
|
|
instruction.
|
|
|
</para>
|
|
|
<para>
|
|
|
However, to ease porting between little-endian (LE) and big-endian
|
|
|
(BE) POWER systems, and between POWER and other platforms, it is
|
|
|
preferable that some built-in functions provide the same semantics
|
|
|
on both LE and BE POWER systems, even if this means that the
|
|
|
built-in functions are implemented with different instruction
|
|
|
sequences for LE and BE. To achieve this, vector built-in
|
|
|
functions provide a set of functions derived from the set of
|
|
|
hardware functions provided by the POWER SIMD instructions. Unlike
|
|
|
traditional “hardware intrinsic” built-in functions, no fixed
|
|
|
mapping exists between these built-in functions and the generated
|
|
|
hardware instruction sequence. Rather, the compiler is free to
|
|
|
generate optimized instruction sequences that implement the
|
|
|
semantics of the program specified by the programmer using these
|
|
|
built-in functions.
|
|
|
</para>
|
|
|
<para>
|
|
|
As we've seen, the POWER SIMD instructions operate on groups of 1,
|
|
|
2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
|
|
|
a big-endian POWER platform, vector elements are loaded from
|
|
|
memory into a register so that the 0th element occupies the
|
|
|
high-order bits of the register, and the (N – 1)th element
|
|
|
occupies the low-order bits of the register. This is referred to
|
|
|
as big-endian element order. On a little-endian POWER platform,
|
|
|
vector elements are loaded from memory such that the 0th element
|
|
|
occupies the low-order bits of the register, and the (N –
|
|
|
1)th element occupies the high-order bits. This is referred to as
|
|
|
little-endian element order.
|
|
|
</para>
|
|
|
|
|
|
<note>
|
|
|
<para>
|
|
|
Much of the information in this chapter was formerly part of
|
|
|
Chapter 6 of the 64-Bit ELF V2 ABI Specification for POWER.
|
|
|
</para>
|
|
|
</note>
|
|
|
|
|
|
<section>
|
|
|
<title>Language Elements</title>
|
|
|
<para>
|
|
|
The C and C++ languages are extended to use new identifiers
|
|
|
<code>vector</code>, <code>pixel</code>, <code>bool</code>,
|
|
|
<code>__vector</code>, <code>__pixel</code>, and
|
|
|
<code>__bool</code>. These keywords are used to specify vector
|
|
|
data types (<xref linkend="VIPR.ch-data-types" />). Because
|
|
|
these identifiers may conflict with keywords in more recent C
|
|
|
and C++ language standards, compilers may implement these in one
|
|
|
of two ways.
|
|
|
</para>
|
|
|
<itemizedlist>
|
|
|
<listitem>
|
|
|
<para>
|
|
|
<code>__vector</code>, <code>__pixel</code>,
|
|
|
<code>__bool</code>, and <code>bool</code> are defined as
|
|
|
keywords, with <code>vector</code> and <code>pixel</code> as
|
|
|
predefined macros that expand to <code>__vector</code> and
|
|
|
<code>__pixel</code>, respectively.
|
|
|
</para>
|
|
|
</listitem>
|
|
|
<listitem>
|
|
|
<para>
|
|
|
<code>__vector</code>, <code>__pixel</code>, and
|
|
|
<code>__bool</code> are defined as keywords in all contexts,
|
|
|
while <code>vector</code>, <code>pixel</code>, and
|
|
|
<code>bool</code> are treated as keywords only within the
|
|
|
context of a type declaration.
|
|
|
</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
<para>
|
|
|
Vector literals may be specified using a type cast and a set of
|
|
|
literal initializers in parentheses or braces. For example,
|
|
|
</para>
|
|
|
<programlisting>vector int x = (vector int) (4, -1, 3, 6);
|
|
|
vector double g = (vector double) { 3.5, -24.6 };</programlisting>
|
|
|
</section>
|
|
|
|
|
|
<section xml:id="VIPR.ch-data-types">
|
|
|
<title>Vector Data Types</title>
|
|
|
<para>
|
|
|
Languages provide support for the data types in <xref
|
|
|
linkend="VIPR.biendian.vectypes" /> to represent vector data
|
|
|
types stored in vector registers.
|
|
|
</para>
|
|
|
<para>
|
|
|
For the C and C++ programming languages (and related/derived
|
|
|
languages), these data types may be accessed based on the type
|
|
|
names listed in <xref linkend="VIPR.biendian.vectypes" /> when
|
|
|
POWER SIMD language extensions are enabled using either the
|
|
|
<code>vector</code> or <code>__vector</code> keywords.
|
|
|
</para>
|
|
|
<para>
|
|
|
For the Fortran language, <xref
|
|
|
linkend="VIPR.biendian.fortran-types" /> gives a correspondence
|
|
|
between Fortran and C/C++ language types.
|
|
|
</para>
|
|
|
<para>
|
|
|
The assignment operator always performs a byte-by-byte data copy
|
|
|
for vector data types.
|
|
|
</para>
|
|
|
<para>
|
|
|
Like other C/C++ language types, vector types may be defined to
|
|
|
have const or volatile properties. Vector data types can be
|
|
|
defined as being in static, auto, and register storage.
|
|
|
</para>
|
|
|
<para>
|
|
|
Pointers to vector types are defined like pointers of other
|
|
|
C/C++ types. Pointers to vector objects may be defined to have
|
|
|
const and volatile properties. Pointers to vector objects must
|
|
|
be divisible by 16, as vector objects are always aligned on
|
|
|
quadword (128-bit) boundaries.
|
|
|
</para>
|
|
|
<para>
|
|
|
The preferred way to access vectors at an application-defined
|
|
|
address is by using vector pointers and the C/C++ dereference
|
|
|
operator <code>*</code>. Similar to other C/C++ data types, the
|
|
|
array reference operator <code>[]</code> may be used to access
|
|
|
vector objects with a vector pointer with the usual definition
|
|
|
to access the <emphasis>n</emphasis>th vector element from a
|
|
|
vector pointer. The dereference operator <code>*</code> may
|
|
|
<emphasis>not</emphasis> be used to access data that is not
|
|
|
aligned at least to a quadword boundary. Built-in functions
|
|
|
such as <code>vec_xl</code> and <code>vec_xst</code> are
|
|
|
provided for unaligned data access.
|
|
|
</para>
|
|
|
<para>
|
|
|
One vector type may be cast to another vector type without
|
|
|
restriction. Such a cast is simply a reinterpretation of the
|
|
|
bits, and does not change the data.
|
|
|
</para>
|
|
|
<para>
|
|
|
Compilers are expected to recognize and optimize multiple
|
|
|
operations that can be optimized into a single hardware
|
|
|
instruction. For example, a load and splat hardware instruction
|
|
|
might be generated for the following sequence:
|
|
|
</para>
|
|
|
<programlisting>double *double_ptr;
|
|
|
register vector double vd = vec_splats(*double_ptr);</programlisting>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes">
|
|
|
<title>Vector Types</title>
|
|
|
<tgroup cols="4">
|
|
|
<colspec colname="c1" colwidth="20*" />
|
|
|
<colspec colname="c2" colwidth="10*" align="center" />
|
|
|
<colspec colname="c3" colwidth="15*" align="center" />
|
|
|
<colspec colname="c4" colwidth="40*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Power SIMD C Types</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">sizeof</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Alignment</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Description</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned char</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 16 unsigned bytes.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed char</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 16 signed bytes.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool char</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 16 bytes with a value of either 0 or
|
|
|
2<superscript>8</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned short</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 unsigned halfwords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed short</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 signed halfwords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool short</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 halfwords with a value of either 0 or
|
|
|
2<superscript>16</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector pixel</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 halfwords, each interpreted as a 1-bit
|
|
|
channel and three 5-bit channels.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned int</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 unsigned words.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed int</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 signed words.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool int</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 words with a value of either 0 or
|
|
|
2<superscript>32</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned long<footnote xml:id="vlong">
|
|
|
<para>The vector long types are deprecated due to their
|
|
|
ambiguity between 32-bit and 64-bit environments. The use
|
|
|
of the vector long long types is preferred.</para>
|
|
|
</footnote></para>
|
|
|
<para>vector unsigned long long</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 unsigned doublewords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed long<footnoteref linkend="vlong" /></para>
|
|
|
<para>vector signed long long</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 signed doublewords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool long<footnoteref linkend="vlong" /></para>
|
|
|
<para>vector bool long long</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 doublewords with a value of either 0 or
|
|
|
2<superscript>64</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned __int128</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 1 unsigned quadword.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed __int128</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 1 signed quadword.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector _Float16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 half-precision floats.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector float</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 single-precision floats.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector double</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 double-precision floats.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Vector Operators</title>
|
|
|
<para>
|
|
|
In addition to the dereference and assignment operators, the
|
|
|
POWER Bi-Endian Vector Programming Model provides the usual
|
|
|
operators that are valid on pointers; these operators are also
|
|
|
valid for pointers to vector types.
|
|
|
</para>
|
|
|
<para>
|
|
|
The traditional C/C++ operators are defined on vector types
|
|
|
with “do all” semantics for unary and binary <code>+</code>,
|
|
|
unary and binary –, binary <code>*</code>, binary
|
|
|
<code>%</code>, and binary <code>/</code> as well as the unary
|
|
|
and binary shift, logical and comparison operators, and the
|
|
|
ternary <code>?:</code> operator.
|
|
|
</para>
|
|
|
<para>
|
|
|
For unary operators, the specified operation is performed on
|
|
|
the corresponding base element of the single operand to derive
|
|
|
the result value for each vector element of the vector
|
|
|
result. The result type of unary operations is the type of the
|
|
|
single input operand.
|
|
|
</para>
|
|
|
<para>
|
|
|
For binary operators, the specified operation is performed on
|
|
|
the corresponding base elements of both operands to derive the
|
|
|
result value for each vector element of the vector
|
|
|
result. Both operands of the binary operators must have the
|
|
|
same vector type with the same base element type. The result
|
|
|
of binary operators is the same type as the type of the input
|
|
|
operands.
|
|
|
</para>
|
|
|
<para>
|
|
|
Further, the array reference operator may be applied to vector
|
|
|
data types, yielding an l-value corresponding to the specified
|
|
|
element in accordance with the vector element numbering rules (see
|
|
|
<xref linkend="VIPR.biendian.layout" />). An l-value may either
|
|
|
be assigned a new value or accessed for reading its value.
|
|
|
</para>
|
|
|
</section>
|
|
|
|
|
|
<section xml:id="VIPR.biendian.layout">
|
|
|
<title>Vector Layout and Element Numbering</title>
|
|
|
<para>
|
|
|
Vector data types consist of a homogeneous sequence of elements
|
|
|
of the base data type specified in the vector data
|
|
|
type. Individual elements of a vector can be addressed by a
|
|
|
vector element number. Element numbers can be established either
|
|
|
by counting from the “left” of a register and assigning the
|
|
|
left-most element the element number 0, or from the “right” of
|
|
|
the register and assigning the right-most element the element
|
|
|
number 0.
|
|
|
</para>
|
|
|
<para>
|
|
|
In big-endian environments, establishing element counts from the
|
|
|
left makes the element stored at the lowest memory address the
|
|
|
lowest-numbered element. Thus, when vectors and arrays of a
|
|
|
given base data type are overlaid, vector element 0 corresponds
|
|
|
to array element 0, vector element 1 corresponds to array
|
|
|
element 1, and so forth.
|
|
|
</para>
|
|
|
<para>
|
|
|
In little-endian environments, establishing element counts from
|
|
|
the right makes the element stored at the lowest memory address
|
|
|
the lowest-numbered element. Thus, when vectors and arrays of a
|
|
|
given base data type are overlaid, vector element 0 will
|
|
|
correspond to array element 0, vector element 1 will correspond
|
|
|
to array element 1, and so forth.
|
|
|
</para>
|
|
|
<para>
|
|
|
Consequently, the vector numbering schemes can be described as
|
|
|
big-endian and little-endian vector layouts and vector element
|
|
|
numberings.
|
|
|
</para>
|
|
|
<para>
|
|
|
This element numbering shall also be used by the <code>[]</code>
|
|
|
accessor method to vector elements provided as an extension of
|
|
|
the C/C++ languages by some compilers, as well as for other
|
|
|
language extensions or library constructs that directly or
|
|
|
indirectly refer to elements by their element number.
|
|
|
</para>
|
|
|
<para>
|
|
|
Application programs may query the vector element ordering in
|
|
|
use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
|
|
|
has two possible values:
|
|
|
</para>
|
|
|
<informaltable frame="none" rowsep="0" colsep="0">
|
|
|
<tgroup cols="2">
|
|
|
<colspec colname="c1" colwidth="40*" />
|
|
|
<colspec colname="c2" colwidth="60*" />
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>__ORDER_LITTLE_ENDIAN__</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector elements use little-endian element ordering.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>__ORDER_BIG_ENDIAN__</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector elements use big-endian element ordering.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</informaltable>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Vector Built-In Functions</title>
|
|
|
<para>
|
|
|
Some of the POWER SIMD hardware instructions refer, implicitly
|
|
|
or explicitly, to vector element numbers. For example, the
|
|
|
<code>vspltb</code> instruction has as one of its inputs an
|
|
|
index into a vector. The element at that index position is to
|
|
|
be replicated in every element of the output vector. For
|
|
|
another example, <code>vmuleuh</code> instruction operates on
|
|
|
the even-numbered elements of its input vectors. The hardware
|
|
|
instructions define these element numbers using big-endian
|
|
|
element order, even when the machine is running in little-endian
|
|
|
mode. Thus, a built-in function that maps directly to the
|
|
|
underlying hardware instruction, regardless of the target
|
|
|
endianness, has the potential to confuse programmers on
|
|
|
little-endian platforms.
|
|
|
</para>
|
|
|
<para>
|
|
|
It is more useful to define built-in functions that map to these
|
|
|
instructions to use natural element order. That is, the
|
|
|
explicit or implicit element numbers specified by such built-in
|
|
|
functions should be interpreted using big-endian element order
|
|
|
on a big-endian platform, and using little-endian element order
|
|
|
on a little-endian platform.
|
|
|
</para>
|
|
|
<para>
|
|
|
The descriptions of the built-in functions in <xref
|
|
|
linkend="VIPR.vec-ref" /> contain notes on endian issues that
|
|
|
apply to each built-in function. Furthermore, a built-in
|
|
|
function requiring a different compiler implementation for
|
|
|
big-endian than it uses for little-endian has a sample
|
|
|
compiler implementation for both BE and LE. These sample
|
|
|
implementations are only intended as examples; designers of a
|
|
|
compiler are free to use other methods to implement the
|
|
|
specified semantics as they see fit.
|
|
|
</para>
|
|
|
<section>
|
|
|
<title>Extended Data Movement Functions</title>
|
|
|
<para>
|
|
|
The built-in functions in <xref
|
|
|
linkend="VIPR.biendian.vmx-mem" /> map to Altivec/VMX load and
|
|
|
store instructions and provide access to the “auto-aligning”
|
|
|
memory instructions of the VMX ISA where low-order address
|
|
|
bits are discarded before performing a memory access. These
|
|
|
instructions access load and store data in accordance with the
|
|
|
program's current endian mode, and do not need to be adapted
|
|
|
by the compiler to reflect little-endian operation during code
|
|
|
generation.
|
|
|
</para>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
|
|
|
<title>VMX Memory Access Built-In Functions</title>
|
|
|
<tgroup cols="3">
|
|
|
<colspec colname="c1" colwidth="15*" align="center" />
|
|
|
<colspec colname="c2" colwidth="35*" align="center" />
|
|
|
<colspec colname="c3" colwidth="50*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>
|
|
|
<emphasis role="bold">Built-in Function</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>
|
|
|
<emphasis role="bold">Corresponding POWER
|
|
|
Instructions</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Implementation Notes</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_ld</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>lvx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_lde</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>lvebx, lvehx, lvewx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_ldl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>lvxl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_st</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>stvx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_ste</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>stvebx, stvehx, stvewx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_stl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>stvxl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
<para>
|
|
|
Previous versions of the VMX built-in functions defined
|
|
|
intrinsics to access the VMX instructions <code>lvsl</code>
|
|
|
and <code>lvsr</code>, which could be used in conjunction with
|
|
|
<code>vec_perm</code> and VMX load and store instructions for
|
|
|
unaligned access. The <code>vec_lvsl</code> and
|
|
|
<code>vec_lvsr</code> interfaces are deprecated in accordance
|
|
|
with the interfaces specified here. For compatibility, the
|
|
|
built-in pseudo sequences published in previous VMX documents
|
|
|
continue to work with little-endian data layout and the
|
|
|
little-endian vector layout described in this
|
|
|
document. However, the use of these sequences in new code is
|
|
|
discouraged and usually results in worse performance. It is
|
|
|
recommended (but not required) that compilers issue a warning
|
|
|
when these functions are used in little-endian
|
|
|
environments.
|
|
|
</para>
|
|
|
<para>
|
|
|
It is recommended that programmers use the <code>vec_xl</code>
|
|
|
and <code>vec_xst</code> vector built-in functions to access
|
|
|
unaligned data streams. See the descriptions of these
|
|
|
instructions in <xref linkend="VIPR.vec-ref" /> for further
|
|
|
description and implementation details.
|
|
|
</para>
|
|
|
</section>
|
|
|
<section>
|
|
|
<title>Big-Endian Vector Layout in Little-Endian Environments
|
|
|
(Deprecated)</title>
|
|
|
<para>
|
|
|
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
|
|
|
for POWER provided for optional compiler support for using
|
|
|
big-endian element ordering in little-endian environments.
|
|
|
This was initially deemed useful for porting certain libraries
|
|
|
that assumed big-endian element ordering regardless of the
|
|
|
endianness of their input streams. In practice, this
|
|
|
introduced serious compiler complexity without much utility.
|
|
|
Thus this support (previously controlled by switches
|
|
|
<code>-maltivec=be</code> and/or <code>-qaltivec=be</code>) is
|
|
|
now deprecated. Current versions of the gcc and clang
|
|
|
open-source compilers do not implement this support.
|
|
|
</para>
|
|
|
</section>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Language-Specific Vector Support for Other
|
|
|
Languages</title>
|
|
|
<section>
|
|
|
<title>Fortran</title>
|
|
|
<para>
|
|
|
<xref linkend="VIPR.biendian.fortran-types" /> shows the
|
|
|
correspondence between the C/C++ types described in this
|
|
|
document and their Fortran equivalents. In Fortran, the
|
|
|
Boolean vector data types are represented by
|
|
|
<code>VECTOR(UNSIGNED(</code><emphasis>n</emphasis><code>))</code>.
|
|
|
</para>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.fortran-types">
|
|
|
<title>Fortran Vector Data Types</title>
|
|
|
<tgroup cols="2">
|
|
|
<colspec colname="c1" colwidth="50*" />
|
|
|
<colspec colname="c2" colwidth="50*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">XL Fortran Vector Type</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">XL C/C++ Vector Type</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(1))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed char</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(2))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed short</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(4))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed int</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(8))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed long long, vector signed long<footnote
|
|
|
xml:id="vlongappalling">
|
|
|
<para>The vector long types are deprecated due to their
|
|
|
ambiguity between 32-bit and 64-bit environments. The use
|
|
|
of the vector long long types is preferred.</para>
|
|
|
</footnote></para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(16))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed __int128</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(1))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned char</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(2))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned short</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(4))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned int</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(8))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned long long, vector unsigned long<footnoteref
|
|
|
linkend="vlongappalling" /></para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(16))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned __int128</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(REAL(4))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector float</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(REAL(8))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector double</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(PIXEL)</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector pixel</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
<para>
|
|
|
Because the Fortran language does not support pointers, vector
|
|
|
built-in functions that expect pointers to a base type take an
|
|
|
array element reference to indicate the address of a memory
|
|
|
location that is the subject of a memory access built-in
|
|
|
function.
|
|
|
</para>
|
|
|
<para>
|
|
|
Because the Fortran language does not support type casts, the
|
|
|
<code>vec_convert</code> and <code>vec_concat</code> built-in
|
|
|
functions shown in <xref linkend="VIPR.endian.convert" /> are
|
|
|
provided to perform bit-exact type conversions between vector
|
|
|
types.
|
|
|
</para>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.endian.convert">
|
|
|
<title>Built-In Vector Conversion Functions</title>
|
|
|
<tgroup cols="2">
|
|
|
<colspec colname="c1" colwidth="30*" align="center" />
|
|
|
<colspec colname="c2" colwidth="70*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>
|
|
|
<emphasis role="bold">Group</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Description</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VEC_CONCAT (ARG1, ARG2)<?linebreak?>(Fortran)</para>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Purpose:</para>
|
|
|
<para>Concatenates two elements to form a vector.</para>
|
|
|
<para>Result value:</para>
|
|
|
<para>The resulting vector consists of the two scalar elements,
|
|
|
ARG1 and ARG2, assigned to elements 0 and 1 (using the
|
|
|
environment’s native endian numbering), respectively.</para>
|
|
|
<itemizedlist>
|
|
|
<listitem>
|
|
|
<para><emphasis role="bold">Note: </emphasis>This function corresponds to the C/C++ vector
|
|
|
constructor (vector type){a,b}. It is provided only for
|
|
|
languages without vector constructors.</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed long long vec_concat (signed long long,
|
|
|
signed long long);</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned long long vec_concat (unsigned long long,
|
|
|
unsigned long long);</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector double vec_concat (double, double);</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VEC_CONVERT(V, MOLD)</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Purpose:</para>
|
|
|
<para>Converts a vector to a vector of a given type.</para>
|
|
|
<para>Class:</para>
|
|
|
<para>Pure function</para>
|
|
|
<para>Argument type and attributes:</para>
|
|
|
<itemizedlist spacing="compact">
|
|
|
<listitem>
|
|
|
<para>V Must be an INTENT(IN) vector.</para>
|
|
|
</listitem>
|
|
|
<listitem>
|
|
|
<para>MOLD Must be an INTENT(IN) vector. If it is a
|
|
|
variable, it need not be defined.</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
<para>Result type and attributes:</para>
|
|
|
<para>The result is a vector of the same type as MOLD.</para>
|
|
|
<para>Result value:</para>
|
|
|
<para>The result is as if it were on the left-hand side of an
|
|
|
intrinsic assignment with V on the right-hand side.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
</section>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Examples and Limitations</title>
|
|
|
<section>
|
|
|
<title>Unaligned vector access</title>
|
|
|
<para>
|
|
|
A common programming error is to cast a pointer to a base type
|
|
|
(such as <code>int</code>) to a pointer of the corresponding
|
|
|
vector type (such as <code>vector int</code>), and then
|
|
|
dereference the pointer. This constitutes undefined behavior,
|
|
|
because it casts a pointer with a smaller alignment
|
|
|
requirement to a pointer with a larger alignment requirement.
|
|
|
Compilers may not produce code that you expect in the presence
|
|
|
of undefined behavior.
|
|
|
</para>
|
|
|
<para>
|
|
|
Thus, do not write the following:
|
|
|
</para>
|
|
|
<programlisting> int a[4096];
|
|
|
vector int x = *((vector int *) a);</programlisting>
|
|
|
<para>
|
|
|
Instead, write this:
|
|
|
</para>
|
|
|
<programlisting> int a[4096];
|
|
|
vector int x = vec_xl (0, a);</programlisting>
|
|
|
</section>
|
|
|
<section>
|
|
|
<title>vec_sld is not bi-endian</title>
|
|
|
<para>
|
|
|
One oddity in the bi-endian vector programming model is that
|
|
|
<code>vec_sld</code> has big-endian semantics for code
|
|
|
compiled for both big-endian and little-endian targets. That
|
|
|
is, any code that uses <code>vec_sld</code> without guarding
|
|
|
it with a test on endianness is likely to be incorrect.
|
|
|
</para>
|
|
|
<para>
|
|
|
At the time that the bi-endian model was being developed, it
|
|
|
was discovered that existing code in several Linux packages
|
|
|
was using <code>vec_sld</code> in order to perform multiplies,
|
|
|
or to otherwise shift portions of base elements left. A
|
|
|
straightforward little-endian implementation of
|
|
|
<code>vec_sld</code> would concatenate the two input vectors
|
|
|
in reverse order and shift bytes to the right. This would
|
|
|
only give compatible results for <code>vector char</code>
|
|
|
types. Those using this intrinsic as a cheap multiply, or to
|
|
|
shift bytes within larger elements, would see different
|
|
|
results on little-endian versus big-endian with such an
|
|
|
implementation. Therefore it was decided that
|
|
|
<code>vec_sld</code> would not have a bi-endian
|
|
|
implementation.
|
|
|
</para>
|
|
|
<para>
|
|
|
<code>vec_sro</code> is not bi-endian for similar reasons.
|
|
|
</para>
|
|
|
</section>
|
|
|
<section>
|
|
|
<title>Limitations on bi-endianness of vec_perm</title>
|
|
|
<para>
|
|
|
The <code>vec_perm</code> intrinsic is bi-endian, provided
|
|
|
that it is used to reorder entire elements of the input
|
|
|
vectors.
|
|
|
</para>
|
|
|
<para>
|
|
|
To see why this is, let's examine the code generation for
|
|
|
</para>
|
|
|
<programlisting> vector int t;
|
|
|
vector int a = (vector int){0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f};
|
|
|
vector int b = (vector int){0x10111213, 0x14151617, 0x18191a1b, 0x1c1d1e1f};
|
|
|
vector char c = (vector char){0,1,2,3,28,29,30,31,12,13,14,15,20,21,22,23};
|
|
|
t = vec_perm (a, b, c);</programlisting>
|
|
|
<para>
|
|
|
For big endian, a compiler should generate:
|
|
|
</para>
|
|
|
<programlisting> vperm t,a,b,c</programlisting>
|
|
|
<para>
|
|
|
For little endian targeting a POWER8 system, a compiler should
|
|
|
generate:
|
|
|
</para>
|
|
|
<programlisting> vnand d,c,c
|
|
|
vperm t,b,a,d</programlisting>
|
|
|
<para>
|
|
|
For little endian targeting a POWER9 system, a compiler should
|
|
|
generate:
|
|
|
</para>
|
|
|
<programlisting> vpermr t,b,a,c</programlisting>
|
|
|
<para>
|
|
|
Note that the <code>vpermr</code> instruction takes care of
|
|
|
modifying the permute control vector (PCV) <code>c</code> that
|
|
|
was done using the <code>vnand</code> instruction for POWER8.
|
|
|
Because only the bottom 5 bits of each element of the PCV are
|
|
|
read by the hardware, this has the effect of subtracting the
|
|
|
original elements of the PCV from 31.
|
|
|
</para>
|
|
|
<para>
|
|
|
Note also that the PCV <code>c</code> has element values that
|
|
|
are contiguous in groups of 4. This selects entire elements
|
|
|
from the input vectors <code>a</code> and <code>b</code> to
|
|
|
reorder. Thus the intent of the code is to select the first
|
|
|
integer element of <code>a</code>, the last integer element of
|
|
|
<code>b</code>, the last integer element of <code>a</code>,
|
|
|
and the second integer element of <code>b</code>, in that
|
|
|
order.
|
|
|
</para>
|
|
|
<para>
|
|
|
For little endian, the modified PCV is elementwise subtracted
|
|
|
from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
|
|
|
Since the elements appear in reverse order in a register when
|
|
|
loaded from little-endian memory, the elements appear in the
|
|
|
register from left to right as
|
|
|
{8,9,10,11,16,17,18,19,0,1,2,3,28,29,30,31}. So the following
|
|
|
<code>vperm</code> instruction will again select entire
|
|
|
elements using the groups of 4 contiguous bytes, and the
|
|
|
values of the integers will be reordered without compromising
|
|
|
each integer's contents. The fact that the little-endian
|
|
|
result matches the big-endian result is left as an exercise to
|
|
|
the reader.
|
|
|
</para>
|
|
|
<para>
|
|
|
Now, suppose instead that the original PCV does not reorder
|
|
|
entire integers at once:
|
|
|
</para>
|
|
|
<programlisting> vector char c = (vector char){0,20,31,4,7,17,6,19,30,3,2,8,9,13,5,22};</programlisting>
|
|
|
<para>
|
|
|
The result of the big-endian implementation would be:
|
|
|
</para>
|
|
|
<programlisting> t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};</programlisting>
|
|
|
<para>
|
|
|
For little-endian, the modified PCV would be
|
|
|
{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
|
|
|
the register as
|
|
|
{9,26,18,22,23,29,28,1,12,25,14,24,27,0,11,31}. The final
|
|
|
little-endian result would be
|
|
|
</para>
|
|
|
<programlisting> t = {0x071c1703, 0x10051204, 0x0b01001d, 0x15060e0a};</programlisting>
|
|
|
<para>
|
|
|
which bears no resemblance to the big-endian result.
|
|
|
</para>
|
|
|
<para>
|
|
|
The lesson here is to only use <code>vec_perm</code> to
|
|
|
reorder entire elements of a vector. If you must use vec_perm
|
|
|
for another purpose, your code must include a test for
|
|
|
endianness and separate algorithms for big- and
|
|
|
little-endian.
|
|
|
</para>
|
|
|
</section>
|
|
|
</section>
|
|
|
|
|
|
</chapter>
|