You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2712 lines
86 KiB

<!--
Copyright (c) 2019 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
<!-- Chapter Title goes here. -->
<title>The Power Bi-Endian Vector Programming Model</title>
<para>
To ensure portability of applications optimized to exploit the
SIMD functions of Power ISA processors, this reference defines a
set of functions and data types for SIMD programming. Compliant
compilers will provide suitable support for these functions,
preferably as built-in functions that translate to one or more
Power ISA instructions.
</para>
<para>
Compilers are encouraged, but not required, to provide built-in
functions to access individual instructions in the IBM Power®
instruction set architecture. In most cases, each such built-in
function should provide direct access to the underlying
instruction.
</para>
<para>
However, to ease porting between little-endian (LE) and big-endian
(BE) Power systems, and between Power and other platforms, it is
preferable that some built-in functions provide the same semantics
on both LE and BE Power systems, even if this means that the
built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of
hardware functions provided by the Power SIMD instructions. Unlike
traditional “hardware intrinsic” built-in functions, no fixed
mapping exists between these built-in functions and the generated
hardware instruction sequence. Rather, the compiler is free to
generate optimized instruction sequences that implement the
semantics of the program specified by the programmer using these
built-in functions.
</para>
<para>
As we've seen, the Power SIMD instructions operate on groups of 1,
2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
a big-endian Power platform, vector elements are loaded from
memory into a register so that the 0th element occupies the
high-order bits of the register, and the (N &#8211; 1)th element
occupies the low-order bits of the register. This is referred to
as big-endian element order. On a little-endian Power platform,
vector elements are loaded from memory such that the 0th element
occupies the low-order bits of the register, and the (N &#8211;
1)th element occupies the high-order bits. This is referred to as
little-endian element order.
</para>
<note>
<para>
Much of the information in this chapter was formerly part of
Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power.
</para>
</note>
<section>
<title>Language Elements</title>
<para>
The C and C++ languages are extended to use new identifiers
<code>vector</code>, <code>pixel</code>, <code>bool</code>,
<code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code>. These keywords are used to specify vector
data types (<xref linkend="VIPR.ch-data-types" />). Because
these identifiers may conflict with keywords in more recent
language standards for C and C++, compilers may implement these
in one of two ways.
</para>
<itemizedlist>
<listitem>
<para>
<code>__vector</code>, <code>__pixel</code>,
<code>__bool</code>, and <code>bool</code> are defined as
keywords, with <code>vector</code> and <code>pixel</code> as
predefined macros that expand to <code>__vector</code> and
<code>__pixel</code>, respectively.
</para>
</listitem>
<listitem>
<para>
<code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code> are defined as keywords in all contexts,
while <code>vector</code>, <code>pixel</code>, and
<code>bool</code> are treated as keywords only within the
context of a type declaration.
</para>
</listitem>
</itemizedlist>
<para>
As a motivating example, the <emphasis
role="bold">vector</emphasis> token is used as a type in the
C++ Standard Template Library, and hence cannot be used as an
unrestricted keyword, but can be used in the context-sensitive
implementation. For example, <emphasis role="bold">vector
char</emphasis> is distinct from <emphasis
role="bold">std::vector</emphasis> in the context-sensitive
implementation.
</para>
<para>
Vector literals may be specified using a type cast and a set of
literal initializers in parentheses or braces. For example,
</para>
<programlisting>vector int x = (vector int) (4, -1, 3, 6);
vector double g = (vector double) { 3.5, -24.6 };</programlisting>
<para>
Current C compilers do not support literals for
<code>__int128</code> types. A <code>vector __int128</code>
constant can be constructed from smaller literals
with appropriate cast-shift-or logic. For example,
<programlisting>
vector unsigned __int128 x = { (((unsigned __int128)0x1020304050607080) &lt;&lt; 64) | 0x90A0B0C0D0E0F000 };
</programlisting>
</para>
</section>
<section xml:id="VIPR.ch-data-types">
<title>Vector Data Types</title>
<para>
Languages provide support for the data types in <xref
linkend="VIPR.biendian.vectypes" /> to represent vector data
types stored in vector registers.
</para>
<para>
For the C and C++ programming languages (and related/derived
languages), the "Power SIMD C Types" listed in the leftmost
column of <xref linkend="VIPR.biendian.vectypes" /> may be used
when Power SIMD language extensions are enabled. Either
<code>vector</code> or <code>__vector</code> may be used in the
type name. Note that the ELFv2 ABI for Power also includes a
<code>vector _Float16</code> data type. As of this writing, no
current compilers for Power have implemented such a type. This
document does not include that type or any intrinsics related to
it.
</para>
<para>
For the Fortran language, <phrase revisionflag="changed"><xref
linkend="VIPR.biendian.fortrantypes" /></phrase> gives a correspondence
between Fortran and C/C++ language types.
</para>
<para>
The assignment operator always performs a byte-by-byte data copy
for vector data types.
</para>
<para>
Like other C/C++ language types, vector types may be defined to
have const or volatile properties. Vector data types can be
defined as being in static, auto, and register storage.
</para>
<para>
Pointers to vector types are defined like pointers of other
C/C++ types. Pointers to vector objects may be defined to have
const and volatile properties. Pointers to vector objects must
be addresses divisible by 16, as vector objects are always
aligned on quadword (16-byte, or 128-bit) boundaries.
</para>
<para>
The preferred way to access vectors at an application-defined
address is by using vector pointers and the C/C++ dereference
operator <code>*</code>. Similar to other C/C++ data types, the
array reference operator <code>[]</code> may be used to access
vector objects with a vector pointer with the usual definition
to access the <emphasis>N</emphasis>th vector element from a
vector pointer. The dereference operator <code>*</code> may
<emphasis>not</emphasis> be used to access data that is not
aligned at least to a quadword boundary. Built-in functions such as
<code><xref linkend="vec_xl" xrefstyle="select:title nopage"/></code> and
<code><xref linkend="vec_xst" xrefstyle="select:title nopage"/></code> and
provided for unaligned data access. Please refer to <xref
linkend="VIPR.biendian.unaligned" /> for an example.
</para>
<para>
One vector type may be cast to another vector type without
restriction. Such a cast is simply a reinterpretation of the
bits, and does not change the data. There are no default
conversions for vector types.
</para>
<para>
Compilers are expected to recognize and optimize multiple
operations that can be optimized into a single hardware
instruction. For example, a load-and-splat hardware instruction
(such as <emphasis role="bold">lxvdsx</emphasis>)
might be generated for the following sequence:
</para>
<programlisting>double *double_ptr;
register vector double vd = vec_splats(*double_ptr);</programlisting>
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes">
<title>Vector Types</title>
<tgroup cols="4">
<colspec colname="c1" colwidth="20*" />
<colspec colname="c2" colwidth="10*" align="center" />
<colspec colname="c3" colwidth="15*" align="center" />
<colspec colname="c4" colwidth="40*" />
<thead>
<row>
<entry align="center">
<para>
<emphasis role="bold">Power SIMD C Types</emphasis>
</para>
</entry>
<entry align="center">
<para>
<emphasis role="bold">sizeof</emphasis>
</para>
</entry>
<entry align="center">
<para>
<emphasis role="bold">Alignment</emphasis>
</para>
</entry>
<entry align="center">
<para>
<emphasis role="bold">Description</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para>vector unsigned char</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 16 unsigned bytes.</para>
</entry>
</row>
<row>
<entry>
<para>vector signed char</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 16 signed bytes.</para>
</entry>
</row>
<row>
<entry>
<para>vector bool char</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 16 bytes with a value of either 0 or
2<superscript>8</superscript> &#8211; 1.</para>
</entry>
</row>
<row>
<entry>
<para>vector unsigned short</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 8 unsigned halfwords.</para>
</entry>
</row>
<row>
<entry>
<para>vector signed short</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 8 signed halfwords.</para>
</entry>
</row>
<row>
<entry>
<para>vector bool short</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 8 halfwords with a value of either 0 or
2<superscript>16</superscript> &#8211; 1.</para>
</entry>
</row>
<row>
<entry>
<para>vector pixel</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 8 halfwords, each interpreted as a 1-bit
channel and three 5-bit channels.</para>
</entry>
</row>
<row>
<entry>
<para>vector unsigned int</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 4 unsigned words.</para>
</entry>
</row>
<row>
<entry>
<para>vector signed int</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 4 signed words.</para>
</entry>
</row>
<row>
<entry>
<para>vector bool int</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 4 words with a value of either 0 or
2<superscript>32</superscript> &#8211; 1.</para>
</entry>
</row>
<row>
<entry>
<para>vector unsigned long<footnote xml:id="vlong">
<para>The vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred.</para>
</footnote></para>
<para>vector unsigned long long</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 2 unsigned doublewords.</para>
</entry>
</row>
<row>
<entry>
<para>vector signed long<footnoteref linkend="vlong" /></para>
<para>vector signed long long</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 2 signed doublewords.</para>
</entry>
</row>
<row>
<entry>
<para>vector bool long<footnoteref linkend="vlong" /></para>
<para>vector bool long long</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 2 doublewords with a value of either 0 or
2<superscript>64</superscript> &#8211; 1.</para>
</entry>
</row>
<row>
<entry>
<para>vector unsigned __int128</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 1 unsigned quadword.</para>
</entry>
</row>
<row>
<entry>
<para>vector signed __int128</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 1 signed quadword.</para>
</entry>
</row>
<row>
<entry>
<para>vector float</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 4 single-precision floats.</para>
</entry>
</row>
<row>
<entry>
<para>vector double</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 2 double-precision floats.</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section>
<title>Vector Operators</title>
<para>
In addition to the dereference and assignment operators, the
Power Bi-Endian Vector Programming Model provides the usual
operators that are valid on pointers; these operators are also
valid for pointers to vector types.
</para>
<para>
The traditional C/C++ unary operators (<code>+</code>
<code>-</code>, and <code>~</code>), are defined on vector types.
The traditional C/C++ binary operators (<code>+</code>,
<code>-</code>, <code>*</code>, <code>%</code>, <code>/</code>,
shift, logical, and comparison) and the ternary operator
(<code>?:</code>)
are defined on like vector types.
Other than <code>?:</code>, these operators perform their
operations "elementwise" on the base elements of the operands,
as follows.
</para>
<para>
For unary operators, the specified operation is performed on
each base element of the single operand to derive the result
value placed into the corresponding element of the vector
result. The result type of unary operations is the type of the
single operand. For example,
</para>
<programlisting>vector signed int a, b;
a = -b;</programlisting>
<para>
produces the same result as
</para>
<programlisting>vector signed int a, b;
a = vec_neg (b);</programlisting>
<para>
For binary operators, the specified operation is performed on
corresponding base elements of both operands to derive the
result value for each vector element of the vector result. Both
operands of the binary operators must have the same vector type
with the same base element type. The result of binary operators
is the same type as the type of the operands. For example,
</para>
<programlisting>vector signed int a, b;
a = a + b;</programlisting>
<para>
produces the same result as
</para>
<programlisting>vector signed int a, b;
a = vec_add (a, b);</programlisting>
<para>
For the ternary operator (<code>?:</code>), the first operand must
be an integral type, used to select between the second and third
operands which must be of the same vector type.
The result of the ternary operator will also have that type.
For example,
<programlisting>
int test_value;
vector signed int a, b, r;
r = test_value ? a : b;
</programlisting>
produces the same result as
<programlisting>
int test_value;
vector signed int a, b, r;
if (test_value)
r = a;
else
r = b;
</programlisting>
</para>
<para>
Further, the array reference operator may be applied to vector
data types, yielding an l-value corresponding to the specified
element in accordance with the vector element numbering rules (see
<xref linkend="VIPR.biendian.layout" />). An l-value may either
be assigned a new value or accessed for reading its value. For
example,
</para>
<programlisting>vector signed int a;
signed int b, c;
b = a[0];
a[3] = c;</programlisting>
</section>
<section xml:id="VIPR.biendian.layout">
<title>Vector Layout and Element Numbering</title>
<para>
Vector data types consist of a homogeneous sequence of elements
of the base data type specified in the vector data
type. Individual elements of a vector can be addressed by a
vector element number. To understand how vector elements are
represented in memory and in registers, it is best to start with
some simple concepts of endianness.
</para>
<figure pgwide="1" xml:id="scalar-endian">
<title>Scalar Quantities and Endianness</title>
<mediaobject>
<imageobject>
<imagedata fileref="Scalar-endian.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<para>
<xref linkend="scalar-endian" /> shows different representations
of a 64-bit scalar integer with the hexadecimal value
<code>0x0123456789ABCDEF</code>. We say that the most
significant byte (MSB) of this value is <code>0x01</code>, and
its least significant byte (LSB) is <code>0xEF</code>. The scalar
value is stored using eight bytes of memory. On a little-endian
(LE) system, the LSB is stored at the lowest address of these
eight bytes, and the MSB is stored at the highest address. On a
big-endian (BE) system, the MSB is stored at the lowest address
of these eight bytes, and the LSB is stored at the highest
address. Regardless of the memory order, the register
representation of the scalar value is identical; the MSB is
located on the "left" end of the register, and the LSB is
located on the "right" end.
</para>
<para>
Of course, the concept of "left" and "right" is a useful
fiction; there is no guarantee that the circuitry of a hardware
register is laid out this way. However, we will see, as we deal
with vector elements, that the concepts of left and right are
more natural for human understanding than byte and element
significance. Indeed, most programming languages have
operators, such as shift-left and shift-right, that use this
same terminology.
</para>
<para>
Let's move from scalars to arrays, which are more interesting to
us since we can use vector registers to operate on arrays, or
portions of larger arrays. Suppose we
have an array of bytes with values 0 through 15, as shown in
<xref linkend="byte-array-endian" />. Note that each byte is a
separate data element with only one possible representation in
memory, so the array of bytes looks identical in memory,
regardless of whether we are using a BE system or an LE system.
But when we load these 16 bytes into a vector register, perhaps
by using the ISA 3.0 <emphasis role="bold">lxv</emphasis>
instruction, the byte at the lowest address on an LE system will
be placed in the LSB of the vector register, but on a BE system
will be placed in the MSB of the vector register. Thus the
array elements appear "right to left" in the register on an LE
system, and "left to right" in the register on a BE system.
</para>
<figure pgwide="1" xml:id="byte-array-endian">
<title>Byte Arrays and Endianness</title>
<mediaobject>
<imageobject>
<imagedata fileref="Byte-array-endian.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<para>
Things become even more interesting when we consider arrays of
larger elements. In <xref linkend="word-array-endian" />, we
see the layout of an array of four 32-bit integers, where the 0th
element has hexadecimal value <code>0x00010203</code>, the 1st
element has value <code>0x04050607</code>, the 2nd element has
value <code>0x08090A0B</code>, and the 3rd element has value
<code>0x0C0D0E0F</code>. The order of the array elements in
memory is the same for both LE and BE systems; but the layout of
each element itself is reversed. When the <emphasis
role="bold">lxv</emphasis> instruction is used to load the
memory into a vector register, again the low address is loaded
into the LSB of the register for LE, but loaded into the MSB of
the register for BE. The effect is that the array elements
again appear right-to-left on a LE system and left-to-right on a
BE system. Note that each 32-bit element of the array has its
most significant bit "on the left" whether a LE or BE system is
in use. This is of course necessary for proper arithmetic to be
performed on the array elements by vector instructions.
</para>
<figure pgwide="1" xml:id="word-array-endian">
<title>Word Arrays and Endianness</title>
<mediaobject>
<imageobject>
<imagedata fileref="Word-array-endian.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<!-- Element numbers can be established either
by counting from the “left” of a register and assigning the
left-most element the element number 0, or from the “right” of
the register and assigning the right-most element the element
number 0.
</para>
-->
<para>
Thus on a BE system, we number vector elements starting with 0
on the left, while on an LE system, we number vector elements
starting with 0 on the right. We will informally refer to these
as big-endian and little-endian vector element numberings and
vector layouts.
</para>
<para>
This element numbering shall also be used by the <code>[]</code>
accessor method to vector elements provided as an extension of
the C/C++ languages by some compilers, as well as for other
language extensions or library constructs that directly or
indirectly refer to elements by their element number.
</para>
<para>
Application programs may query the vector element ordering in
use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
has two possible values:
</para>
<informaltable frame="none" rowsep="0" colsep="0">
<tgroup cols="2">
<colspec colname="c1" colwidth="40*" />
<colspec colname="c2" colwidth="60*" />
<tbody>
<row>
<entry>
<para>__ORDER_LITTLE_ENDIAN__</para>
</entry>
<entry>
<para>Vector elements use little-endian element ordering.</para>
</entry>
</row>
<row>
<entry>
<para>__ORDER_BIG_ENDIAN__</para>
</entry>
<entry>
<para>Vector elements use big-endian element ordering.</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
This is no longer as useful as it once was. The primary use
case was for big-endian vector layout in little-endian
environments, which is now deprecated as discussed in <xref
linkend="VIPR.biendian.BELE" />. It's generally equivalent to
test for <code>__BIG_ENDIAN__</code> or
<code>__LITTLE_ENDIAN__</code>.
</para>
<note>
<para>
Remember that each element in a vector has the same representation
in both big- and little-endian element orders. That is, an
<code>int</code> is always 32 bits, with the sign bit in the
high-order position. Programmers must be aware of this when
programming with mixed data types, such as an instruction that
multiplies two <code>short</code> elements to produce an
<code>int</code> element. Always access entire elements to
avoid potential endianness issues.
</para>
</note>
</section>
<section>
<title>Vector Built-In Functions</title>
<para>
Some of the Power SIMD hardware instructions refer, implicitly
or explicitly, to vector element numbers. For example, the
<code>vspltb</code> instruction has as one of its inputs an
index into a vector. The element at that index position is to
be replicated in every element of the output vector. For
another example, <code>vmuleuh</code> instruction operates on
the even-numbered elements of its input vectors. The hardware
instructions define these element numbers using big-endian
element order, even when the machine is running in little-endian
mode. Thus, a built-in function that maps directly to the
underlying hardware instruction, regardless of the target
endianness, has the potential to confuse programmers on
little-endian platforms.
</para>
<para>
It is more useful to define built-in functions that map to these
instructions to use natural element order. That is, the
explicit or implicit element numbers specified by such built-in
functions should be interpreted using big-endian element order
on a big-endian platform, and using little-endian element order
on a little-endian platform.
</para>
<para>
The descriptions of the built-in functions in <xref
linkend="VIPR.vec-ref" /> contain notes on endian issues that
apply to each built-in function. Furthermore, a built-in
function requiring a different compiler implementation for
big-endian than it uses for little-endian has a sample
compiler implementation for both BE and LE. These sample
implementations are only intended as examples; designers of a
compiler are free to use other methods to implement the
specified semantics.
</para>
<para>
Of course, most built-in functions operate only on corresponding
sets of elements of input vectors to produce output vectors, and
thus are not "endian-sensitive." A complete list of
endian-sensitive built-in functions can be found in <xref
linkend="VIPR.biendian.sensitive" />.
</para>
<table frame="all" pgwide="1" xml:id="VIPR.biendian.sensitive">
<title>Endian-Sensitive Built-In Functions</title>
<tgroup cols="3">
<colspec colname="c1" colwidth="15*" align="center" />
<colspec colname="c2" colwidth="15*" align="center" />
<colspec colname="c3" colwidth="15*" align="center" />
<tbody>
<row>
<entry>
<para><code><xref linkend="vec_bperm" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_inserth"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_signextll"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_cipher_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_insertl"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_signextq"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_cipherlast_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_mergee" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_sld" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_clr_first"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_mergeh" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_sldb" xrefstyle="select:title
nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_clr_last"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_mergel" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_sldw" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_doublee" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_mergeo" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_sll" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_doubleh" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_mfvscr" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_slo" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_doublel" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_mule" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_slv" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_doubleo" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_mulo" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_splat" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_extract" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_ncipher_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_splati_ins"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_extract_fp32_from_shorth" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_ncipherlast_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_srdb" xrefstyle="select:title
nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_extract_fp32_from_shortl" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_pack" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_srl" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_extract4b" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_pack_to_short_fp32" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_sro" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_extracth"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_packpx" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_srv" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_extractl"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_packs" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_stril"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_first_match_index" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_packsu" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_stril_p"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_first_match_or_eos_index" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_perm" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_strir"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_first_mismatch_index" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_permx"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_strir_p"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_first_mismatch_or_eos_index" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_permxor" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_sum2s" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_float2" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_pmsum_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_sums" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_floate" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_replace_elt"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_unpackh"
xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_floato" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_replace_unaligned"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_unpackl" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_genbm"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_reve" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_unsigned2" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_gendm"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_sbox_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_unsignede" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_genhm"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_shasigma_be" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_unsignedo" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_genpcvm"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_signed2" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_xl" xrefstyle="select:title nopage"/></code> (ISA 2.07 only)</para>
</entry>
</row>
<row>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_genwm"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_signede" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_xl_be" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_insert" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_signedo" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para><code><xref linkend="vec_xst" xrefstyle="select:title nopage"/></code> (ISA 2.07 only)</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_insert4b" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para revisionflag="added">
<code><xref linkend="vec_signexti"
xrefstyle="select:title nopage"/></code>
</para>
</entry>
<entry>
<para><code><xref linkend="vec_xst_be" xrefstyle="select:title nopage"/></code></para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<section>
<title>Extended Data Movement Functions</title>
<para>
The built-in functions in <xref
linkend="VIPR.biendian.vmx-mem" /> map to Altivec/VMX load and
store instructions and provide access to the “auto-aligning”
memory instructions of the VMX ISA where low-order address
bits are discarded before performing a memory access. These
instructions load and store data in accordance with the
program's current endian mode, and do not need to be adapted
by the compiler to reflect little-endian operation during code
generation.
</para>
<para>
Before the bi-endian programming model was introduced, the
<code>vec_lvsl</code> and <code>vec_lvsr</code> intrinsics
were supported. These could be used in conjunction with
<code><xref linkend="vec_perm" xrefstyle="select:title nopage"/></code>
and VMX load and store instructions for
unaligned access. The <code>vec_lvsl</code> and
<code>vec_lvsr</code> interfaces are deprecated in accordance
with the interfaces specified here. For compatibility, the
built-in pseudo sequences published in previous VMX documents
continue to work with little-endian data layout and the
little-endian vector layout described in this document.
However, the use of these sequences in new code is discouraged
and usually results in worse performance. It is recommended
that compilers issue a warning when these functions are used
in little-endian environments.
</para>
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
<title>VMX Memory Access Built-In Functions</title>
<tgroup cols="3">
<colspec colname="c1" colwidth="15*" align="center" />
<colspec colname="c2" colwidth="35*" align="center" />
<colspec colname="c3" colwidth="50*" />
<thead>
<row>
<entry>
<para>
<emphasis role="bold">Built-in Function</emphasis>
</para>
</entry>
<entry>
<para>
<emphasis role="bold">Corresponding Power
Instructions</emphasis>
</para>
</entry>
<entry align="center">
<para>
<emphasis role="bold">Implementation Notes</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para><code><xref linkend="vec_ld" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para>lvx</para>
</entry>
<entry>
<para>Hardware works as a function of endian mode.</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_lde" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para>lvebx, lvehx, lvewx</para>
</entry>
<entry>
<para>Hardware works as a function of endian mode.</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_ldl" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para>lvxl</para>
</entry>
<entry>
<para>Hardware works as a function of endian mode.</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_st" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para>stvx</para>
</entry>
<entry>
<para>Hardware works as a function of endian mode.</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_ste" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para>stvebx, stvehx, stvewx</para>
</entry>
<entry>
<para>Hardware works as a function of endian mode.</para>
</entry>
</row>
<row>
<entry>
<para><code><xref linkend="vec_stl" xrefstyle="select:title nopage"/></code></para>
</entry>
<entry>
<para>stvxl</para>
</entry>
<entry>
<para>Hardware works as a function of endian mode.</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Instead, it is recommended that programmers use the
<code><xref linkend="vec_xl" xrefstyle="select:title nopage"/></code> and
<code><xref linkend="vec_xst" xrefstyle="select:title nopage"/></code>
vector built-in
functions to access unaligned data streams. See the
descriptions of these instructions in <xref
linkend="VIPR.vec-ref" /> for further description and
implementation details.
</para>
</section>
<section xml:id="VIPR.biendian.BELE">
<title>Big-Endian Vector Layout in Little-Endian Environments
(Deprecated)</title>
<para>
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
for Power provided for optional compiler support for using
big-endian element ordering in little-endian environments.
This was initially deemed useful for porting certain libraries
that assumed big-endian element ordering regardless of the
endianness of their input streams. In practice, this
introduced serious compiler complexity without much utility.
Thus this support (previously controlled by switches
<code>-maltivec=be</code> and/or <code>-qaltivec=be</code>) is
now deprecated. Current versions of the <phrase
revisionflag="changed">GCC, Clang, and Open XL</phrase>
compilers do not implement this support.
</para>
</section>
</section>
<section revisionflag="deleted">
<title>Language-Specific Vector Support for Other
Languages</title>
<section>
<title>Fortran</title>
<para>
<xref linkend="VIPR.biendian.fortran-types" /> shows the
correspondence between the C/C++ types described in this
document and their Fortran equivalents. In Fortran, the
Boolean vector data types are represented by
<code>VECTOR(UNSIGNED(</code><emphasis>n</emphasis><code>))</code>.
</para>
<table frame="all" pgwide="1" xml:id="VIPR.biendian.fortran-types">
<title>Fortran Vector Data Types</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="50*" />
<colspec colname="c2" colwidth="50*" />
<thead>
<row>
<entry align="center">
<para>
<emphasis role="bold">XL Fortran Vector Type</emphasis>
</para>
</entry>
<entry align="center">
<para>
<emphasis role="bold">XL C/C++ Vector Type</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para>VECTOR(INTEGER(1))</para>
</entry>
<entry>
<para>vector signed char</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(INTEGER(2))</para>
</entry>
<entry>
<para>vector signed short</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(INTEGER(4))</para>
</entry>
<entry>
<para>vector signed int</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(INTEGER(8))</para>
</entry>
<entry>
<para>vector signed long long, vector signed long<footnote
xml:id="vlongappalling">
<para>The vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred.</para>
</footnote></para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(INTEGER(16))</para>
</entry>
<entry>
<para>vector signed __int128</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(UNSIGNED(1))</para>
</entry>
<entry>
<para>vector unsigned char</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(UNSIGNED(2))</para>
</entry>
<entry>
<para>vector unsigned short</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(UNSIGNED(4))</para>
</entry>
<entry>
<para>vector unsigned int</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(UNSIGNED(8))</para>
</entry>
<entry>
<para>vector unsigned long long, vector unsigned long<footnoteref
linkend="vlongappalling" /></para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(UNSIGNED(16))</para>
</entry>
<entry>
<para>vector unsigned __int128</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(REAL(4))</para>
</entry>
<entry>
<para>vector float</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(REAL(8))</para>
</entry>
<entry>
<para>vector double</para>
</entry>
</row>
<row>
<entry>
<para>VECTOR(PIXEL)</para>
</entry>
<entry>
<para>vector pixel</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Because the Fortran language does not support pointers, vector
built-in functions that expect pointers to a base type take an
array element reference to indicate the address of a memory
location that is the subject of a memory access built-in
function.
</para>
<para>
Because the Fortran language does not support type casts, the
<code>vec_convert</code> and <code>vec_concat</code> built-in
functions shown in <xref linkend="VIPR.endian.convert" /> are
provided to perform bit-exact type conversions between vector
types.
</para>
<table frame="all" pgwide="1" xml:id="VIPR.endian.convert">
<title>Built-In Vector Conversion Functions</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="30*" align="center" />
<colspec colname="c2" colwidth="70*" />
<thead>
<row>
<entry>
<para>
<emphasis role="bold">Group</emphasis>
</para>
</entry>
<entry align="center">
<para>
<emphasis role="bold">Description</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<para>VEC_CONCAT (ARG1, ARG2)<?linebreak?>(Fortran)</para>
<para></para>
</entry>
<entry>
<para>Purpose:</para>
<para>Concatenates two elements to form a vector.</para>
<para>Result value:</para>
<para>The resulting vector consists of the two scalar elements,
ARG1 and ARG2, assigned to elements 0 and 1 (using the
environment’s native endian numbering), respectively.</para>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Note: </emphasis>This function corresponds to the C/C++ vector
constructor (vector type){a,b}. It is provided only for
languages without vector constructors.</para>
</listitem>
</itemizedlist>
</entry>
</row>
<row>
<entry>
<para></para>
</entry>
<entry>
<para>vector signed long long vec_concat (signed long long,
signed long long);</para>
</entry>
</row>
<row>
<entry>
<para></para>
</entry>
<entry>
<para>vector unsigned long long vec_concat (unsigned long long,
unsigned long long);</para>
</entry>
</row>
<row>
<entry>
<para></para>
</entry>
<entry>
<para>vector double vec_concat (double, double);</para>
</entry>
</row>
<row>
<entry>
<para>VEC_CONVERT(V, MOLD)</para>
</entry>
<entry>
<para>Purpose:</para>
<para>Converts a vector to a vector of a given type.</para>
<para>Class:</para>
<para>Pure function</para>
<para>Argument type and attributes:</para>
<itemizedlist spacing="compact">
<listitem>
<para>V Must be an INTENT(IN) vector.</para>
</listitem>
<listitem>
<para>MOLD Must be an INTENT(IN) vector. If it is a
variable, it need not be defined.</para>
</listitem>
</itemizedlist>
<para>Result type and attributes:</para>
<para>The result is a vector of the same type as MOLD.</para>
<para>Result value:</para>
<para>The result is as if it were on the left-hand side of an
intrinsic assignment with V on the right-hand side.</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
</section>
<section>
<title>Examples and Limitations</title>
<section xml:id="VIPR.biendian.unaligned">
<title>Unaligned vector access</title>
<para>
A common programming error is to cast a pointer to a base type
(such as <code>int</code>) to a pointer of the corresponding
vector type (such as <code>vector int</code>), and then
dereference the pointer. This constitutes undefined behavior,
because it casts a pointer with a smaller alignment
requirement to a pointer with a larger alignment requirement.
Compilers may not produce code that you expect in the presence
of undefined behavior.
</para>
<para>
Thus, do not write the following:
</para>
<programlisting> int a[4096];
vector int x = *((vector int *) a);</programlisting>
<para>
Instead, write this:
</para>
<programlisting> int a[4096];
vector int x = vec_xl (0, a);</programlisting>
</section>
<section xml:id="VIPR.biendian.sld">
<title>vec_sld and vec_sro are not bi-endian</title>
<para>
One oddity in the bi-endian vector programming model is that
<code><xref linkend="vec_sld" xrefstyle="select:title nopage"/></code>
has big-endian semantics for code
compiled for both big-endian and little-endian targets. That
is, any code that uses
<code><xref linkend="vec_sld" xrefstyle="select:title nopage"/></code>
without guarding
it with a test on endianness is likely to be incorrect.
</para>
<para>
At the time that the bi-endian model was being developed, it
was discovered that existing code in several Linux packages
was using
<code><xref linkend="vec_sld" xrefstyle="select:title nopage"/></code>
in order to perform multiplies,
or to otherwise shift portions of base elements left. A
straightforward little-endian implementation of
<code><xref linkend="vec_sld" xrefstyle="select:title nopage"/></code>
would concatenate the two input vectors
in reverse order and shift bytes to the right. This would
only give compatible results for <code>vector char</code>
types. Those using this intrinsic as a cheap multiply, or to
shift bytes within larger elements, would see different
results on little-endian versus big-endian with such an
implementation. Therefore it was decided that
<code><xref linkend="vec_sld" xrefstyle="select:title nopage"/></code>
would not have a bi-endian
implementation.
</para>
<para>
<code><xref linkend="vec_sro" xrefstyle="select:title nopage"/></code>
is not bi-endian for similar reasons.
</para>
</section>
<section xml:id="VIPR.biendian.vperm">
<title>Limitations on bi-endianness of vec_perm</title>
<para>
The <code><xref linkend="vec_perm" xrefstyle="select:title nopage"/></code>
intrinsic is bi-endian, provided
that it is used to reorder entire elements of the input
vectors.
</para>
<para>
To see why this is, let's examine the code generation for
</para>
<programlisting> vector int t;
vector int a = (vector int){0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f};
vector int b = (vector int){0x10111213, 0x14151617, 0x18191a1b, 0x1c1d1e1f};
vector char c = (vector char){0,1,2,3,28,29,30,31,12,13,14,15,20,21,22,23};
t = vec_perm (a, b, c);</programlisting>
<para>
For big endian, a compiler should generate:
</para>
<programlisting> vperm t,a,b,c</programlisting>
<para>
For little endian targeting a POWER8 system, a compiler should
generate:
</para>
<programlisting> vnand d,c,c
vperm t,b,a,d</programlisting>
<para>
For little endian targeting a POWER9 system, a compiler should
generate:
</para>
<programlisting> vpermr t,b,a,c</programlisting>
<para>
Note that the <code>vpermr</code> instruction takes care of
modifying the permute control vector (PCV) <code>c</code> that
was done using the <code>vnand</code> instruction for POWER8.
Because only the bottom 5 bits of each element of the PCV are
read by the hardware, this has the effect of subtracting the
original elements of the PCV from 31.
</para>
<para>
Note also that the PCV <code>c</code> has element values that
are contiguous in groups of 4. This selects entire elements
from the input vectors <code>a</code> and <code>b</code> to
reorder. Thus the intent of the code is to select the first
integer element of <code>a</code>, the last integer element of
<code>b</code>, the last integer element of <code>a</code>,
and the second integer element of <code>b</code>, in that
order.
</para>
<para>
The big endian result is {0x00010203, 0x1c1d1e1f, 0x0c0d0e0f,
0x14151617}, as shown here:
</para>
<informaltable frame="all">
<tgroup cols="17">
<colspec colname="c1" colwidth="1*" />
<colspec colname="c2" colwidth="1*" />
<colspec colname="c3" colwidth="1*" />
<colspec colname="c4" colwidth="1*" />
<colspec colname="c5" colwidth="1*" />
<colspec colname="c6" colwidth="1*" />
<colspec colname="c7" colwidth="1*" />
<colspec colname="c8" colwidth="1*" />
<colspec colname="c9" colwidth="1*" />
<colspec colname="c10" colwidth="1*" />
<colspec colname="c11" colwidth="1*" />
<colspec colname="c12" colwidth="1*" />
<colspec colname="c13" colwidth="1*" />
<colspec colname="c14" colwidth="1*" />
<colspec colname="c15" colwidth="1*" />
<colspec colname="c16" colwidth="1*" />
<colspec colname="c17" colwidth="1*" />
<tbody>
<row>
<entry align="center">
<para><emphasis role="bold">a</emphasis></para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
<entry align="center">
<para>04</para>
</entry>
<entry align="center">
<para>05</para>
</entry>
<entry align="center">
<para>06</para>
</entry>
<entry align="center">
<para>07</para>
</entry>
<entry align="center">
<para>08</para>
</entry>
<entry align="center">
<para>09</para>
</entry>
<entry align="center">
<para>0A</para>
</entry>
<entry align="center">
<para>0B</para>
</entry>
<entry align="center">
<para>0C</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0F</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">b</emphasis></para>
</entry>
<entry align="center">
<para>10</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>18</para>
</entry>
<entry align="center">
<para>19</para>
</entry>
<entry align="center">
<para>1A</para>
</entry>
<entry align="center">
<para>1B</para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">c</emphasis></para>
</entry>
<entry align="center">
<para>0</para>
</entry>
<entry align="center">
<para>1</para>
</entry>
<entry align="center">
<para>2</para>
</entry>
<entry align="center">
<para>3</para>
</entry>
<entry align="center">
<para>28</para>
</entry>
<entry align="center">
<para>29</para>
</entry>
<entry align="center">
<para>30</para>
</entry>
<entry align="center">
<para>31</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>20</para>
</entry>
<entry align="center">
<para>21</para>
</entry>
<entry align="center">
<para>22</para>
</entry>
<entry align="center">
<para>23</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">t</emphasis></para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
<entry align="center">
<para>0C</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0F</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
For little endian, the modified PCV is elementwise subtracted
from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
Since the elements appear in reverse order in a register when
loaded from little-endian memory, the elements appear in the
register from left to right as
{8,9,10,11,16,17,18,19,0,1,2,3,28,29,30,31}. So the following
<code>vperm</code> instruction will again select entire
elements using the groups of 4 contiguous bytes, and the
values of the integers will be reordered without compromising
each integer's contents. The little-endian result matches the
big-endian result, as shown. Observe that <emphasis
role="bold">a</emphasis> and <emphasis
role="bold">b</emphasis> switch positions for little endian
code generation.
</para>
<informaltable frame="all">
<tgroup cols="17">
<colspec colname="c1" colwidth="1*" />
<colspec colname="c2" colwidth="1*" />
<colspec colname="c3" colwidth="1*" />
<colspec colname="c4" colwidth="1*" />
<colspec colname="c5" colwidth="1*" />
<colspec colname="c6" colwidth="1*" />
<colspec colname="c7" colwidth="1*" />
<colspec colname="c8" colwidth="1*" />
<colspec colname="c9" colwidth="1*" />
<colspec colname="c10" colwidth="1*" />
<colspec colname="c11" colwidth="1*" />
<colspec colname="c12" colwidth="1*" />
<colspec colname="c13" colwidth="1*" />
<colspec colname="c14" colwidth="1*" />
<colspec colname="c15" colwidth="1*" />
<colspec colname="c16" colwidth="1*" />
<colspec colname="c17" colwidth="1*" />
<tbody>
<row>
<entry align="center">
<para><emphasis role="bold">b</emphasis></para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
<entry align="center">
<para>18</para>
</entry>
<entry align="center">
<para>19</para>
</entry>
<entry align="center">
<para>1A</para>
</entry>
<entry align="center">
<para>1B</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>10</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">a</emphasis></para>
</entry>
<entry align="center">
<para>0C</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0F</para>
</entry>
<entry align="center">
<para>08</para>
</entry>
<entry align="center">
<para>09</para>
</entry>
<entry align="center">
<para>0A</para>
</entry>
<entry align="center">
<para>0B</para>
</entry>
<entry align="center">
<para>04</para>
</entry>
<entry align="center">
<para>05</para>
</entry>
<entry align="center">
<para>06</para>
</entry>
<entry align="center">
<para>07</para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">c</emphasis></para>
</entry>
<entry align="center">
<para>8</para>
</entry>
<entry align="center">
<para>9</para>
</entry>
<entry align="center">
<para>10</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>18</para>
</entry>
<entry align="center">
<para>19</para>
</entry>
<entry align="center">
<para>0</para>
</entry>
<entry align="center">
<para>1</para>
</entry>
<entry align="center">
<para>2</para>
</entry>
<entry align="center">
<para>3</para>
</entry>
<entry align="center">
<para>28</para>
</entry>
<entry align="center">
<para>29</para>
</entry>
<entry align="center">
<para>30</para>
</entry>
<entry align="center">
<para>31</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">t</emphasis></para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>0C</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0F</para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
Now, suppose instead that the original PCV does not reorder
entire integers at once:
</para>
<programlisting> vector char c = (vector char){0,20,31,4,7,17,6,19,30,3,2,8,9,13,5,22};</programlisting>
<para>
The result of the big-endian implementation would be:
</para>
<programlisting> t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};</programlisting>
<informaltable frame="all">
<tgroup cols="17">
<colspec colname="c1" colwidth="1*" />
<colspec colname="c2" colwidth="1*" />
<colspec colname="c3" colwidth="1*" />
<colspec colname="c4" colwidth="1*" />
<colspec colname="c5" colwidth="1*" />
<colspec colname="c6" colwidth="1*" />
<colspec colname="c7" colwidth="1*" />
<colspec colname="c8" colwidth="1*" />
<colspec colname="c9" colwidth="1*" />
<colspec colname="c10" colwidth="1*" />
<colspec colname="c11" colwidth="1*" />
<colspec colname="c12" colwidth="1*" />
<colspec colname="c13" colwidth="1*" />
<colspec colname="c14" colwidth="1*" />
<colspec colname="c15" colwidth="1*" />
<colspec colname="c16" colwidth="1*" />
<colspec colname="c17" colwidth="1*" />
<tbody>
<row>
<entry align="center">
<para><emphasis role="bold">a</emphasis></para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
<entry align="center">
<para>04</para>
</entry>
<entry align="center">
<para>05</para>
</entry>
<entry align="center">
<para>06</para>
</entry>
<entry align="center">
<para>07</para>
</entry>
<entry align="center">
<para>08</para>
</entry>
<entry align="center">
<para>09</para>
</entry>
<entry align="center">
<para>0A</para>
</entry>
<entry align="center">
<para>0B</para>
</entry>
<entry align="center">
<para>0C</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0F</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">b</emphasis></para>
</entry>
<entry align="center">
<para>10</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>18</para>
</entry>
<entry align="center">
<para>19</para>
</entry>
<entry align="center">
<para>1A</para>
</entry>
<entry align="center">
<para>1B</para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">c</emphasis></para>
</entry>
<entry align="center">
<para>0</para>
</entry>
<entry align="center">
<para>20</para>
</entry>
<entry align="center">
<para>31</para>
</entry>
<entry align="center">
<para>4</para>
</entry>
<entry align="center">
<para>7</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>6</para>
</entry>
<entry align="center">
<para>19</para>
</entry>
<entry align="center">
<para>30</para>
</entry>
<entry align="center">
<para>3</para>
</entry>
<entry align="center">
<para>2</para>
</entry>
<entry align="center">
<para>8</para>
</entry>
<entry align="center">
<para>9</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
<entry align="center">
<para>5</para>
</entry>
<entry align="center">
<para>22</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">t</emphasis></para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
<entry align="center">
<para>04</para>
</entry>
<entry align="center">
<para>07</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>06</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>08</para>
</entry>
<entry align="center">
<para>09</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>05</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
For little-endian, the modified PCV would be
{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
the register as
{9,26,18,22,23,29,28,1,12,25,14,24,27,0,11,31}. The final
little-endian result would be
</para>
<programlisting> t = {0x071c1703, 0x10051204, 0x0b01001d, 0x15060e0a};</programlisting>
<para>
which bears no resemblance to the big-endian result.
</para>
<informaltable frame="all">
<tgroup cols="17">
<colspec colname="c1" colwidth="1*" />
<colspec colname="c2" colwidth="1*" />
<colspec colname="c3" colwidth="1*" />
<colspec colname="c4" colwidth="1*" />
<colspec colname="c5" colwidth="1*" />
<colspec colname="c6" colwidth="1*" />
<colspec colname="c7" colwidth="1*" />
<colspec colname="c8" colwidth="1*" />
<colspec colname="c9" colwidth="1*" />
<colspec colname="c10" colwidth="1*" />
<colspec colname="c11" colwidth="1*" />
<colspec colname="c12" colwidth="1*" />
<colspec colname="c13" colwidth="1*" />
<colspec colname="c14" colwidth="1*" />
<colspec colname="c15" colwidth="1*" />
<colspec colname="c16" colwidth="1*" />
<colspec colname="c17" colwidth="1*" />
<tbody>
<row>
<entry align="center">
<para><emphasis role="bold">b</emphasis></para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>1E</para>
</entry>
<entry align="center">
<para>1F</para>
</entry>
<entry align="center">
<para>18</para>
</entry>
<entry align="center">
<para>19</para>
</entry>
<entry align="center">
<para>1A</para>
</entry>
<entry align="center">
<para>1B</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>16</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>10</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>13</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">a</emphasis></para>
</entry>
<entry align="center">
<para>0C</para>
</entry>
<entry align="center">
<para>0D</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0F</para>
</entry>
<entry align="center">
<para>08</para>
</entry>
<entry align="center">
<para>09</para>
</entry>
<entry align="center">
<para>0A</para>
</entry>
<entry align="center">
<para>0B</para>
</entry>
<entry align="center">
<para>04</para>
</entry>
<entry align="center">
<para>05</para>
</entry>
<entry align="center">
<para>06</para>
</entry>
<entry align="center">
<para>07</para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>02</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">c</emphasis></para>
</entry>
<entry align="center">
<para>9</para>
</entry>
<entry align="center">
<para>26</para>
</entry>
<entry align="center">
<para>18</para>
</entry>
<entry align="center">
<para>22</para>
</entry>
<entry align="center">
<para>23</para>
</entry>
<entry align="center">
<para>29</para>
</entry>
<entry align="center">
<para>28</para>
</entry>
<entry align="center">
<para>1</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>25</para>
</entry>
<entry align="center">
<para>14</para>
</entry>
<entry align="center">
<para>24</para>
</entry>
<entry align="center">
<para>27</para>
</entry>
<entry align="center">
<para>0</para>
</entry>
<entry align="center">
<para>11</para>
</entry>
<entry align="center">
<para>31</para>
</entry>
</row>
<row>
<entry align="center">
<para><emphasis role="bold">t</emphasis></para>
</entry>
<entry align="center">
<para>15</para>
</entry>
<entry align="center">
<para>06</para>
</entry>
<entry align="center">
<para>0E</para>
</entry>
<entry align="center">
<para>0A</para>
</entry>
<entry align="center">
<para>0B</para>
</entry>
<entry align="center">
<para>01</para>
</entry>
<entry align="center">
<para>00</para>
</entry>
<entry align="center">
<para>1D</para>
</entry>
<entry align="center">
<para>10</para>
</entry>
<entry align="center">
<para>05</para>
</entry>
<entry align="center">
<para>12</para>
</entry>
<entry align="center">
<para>04</para>
</entry>
<entry align="center">
<para>07</para>
</entry>
<entry align="center">
<para>1C</para>
</entry>
<entry align="center">
<para>17</para>
</entry>
<entry align="center">
<para>03</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
The lesson here is to only use
<code><xref linkend="vec_perm" xrefstyle="select:title nopage"/></code> to
reorder entire elements of a vector. If you must use vec_perm
for another purpose, your code must include a test for
endianness and separate algorithms for big- and
little-endian. Examples of this may be seen in the Power
Vector Library project (see <xref linkend="VIPR.intro.links"
/>).
</para>
</section>
</section>
</chapter>