|
|
<!--
|
|
|
Copyright (c) 2019 OpenPOWER Foundation
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
you may not use this file except in compliance with the License.
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
See the License for the specific language governing permissions and
|
|
|
limitations under the License.
|
|
|
|
|
|
-->
|
|
|
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
|
|
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
|
<title>The Power Bi-Endian Vector Programming Model</title>
|
|
|
|
|
|
<para>
|
|
|
To ensure portability of applications optimized to exploit the
|
|
|
SIMD functions of Power ISA processors, this reference defines a
|
|
|
set of functions and data types for SIMD programming. Compliant
|
|
|
compilers will provide suitable support for these functions,
|
|
|
preferably as built-in functions that translate to one or more
|
|
|
Power ISA instructions.
|
|
|
</para>
|
|
|
<para>
|
|
|
Compilers are encouraged, but not required, to provide built-in
|
|
|
functions to access individual instructions in the IBM Power®
|
|
|
instruction set architecture. In most cases, each such built-in
|
|
|
function should provide direct access to the underlying
|
|
|
instruction.
|
|
|
</para>
|
|
|
<para>
|
|
|
However, to ease porting between little-endian (LE) and big-endian
|
|
|
(BE) Power systems, and between Power and other platforms, it is
|
|
|
preferable that some built-in functions provide the same semantics
|
|
|
on both LE and BE Power systems, even if this means that the
|
|
|
built-in functions are implemented with different instruction
|
|
|
sequences for LE and BE. To achieve this, vector built-in
|
|
|
functions provide a set of functions derived from the set of
|
|
|
hardware functions provided by the Power SIMD instructions. Unlike
|
|
|
traditional “hardware intrinsic” built-in functions, no fixed
|
|
|
mapping exists between these built-in functions and the generated
|
|
|
hardware instruction sequence. Rather, the compiler is free to
|
|
|
generate optimized instruction sequences that implement the
|
|
|
semantics of the program specified by the programmer using these
|
|
|
built-in functions.
|
|
|
</para>
|
|
|
<para>
|
|
|
As we've seen, the Power SIMD instructions operate on groups of 1,
|
|
|
2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
|
|
|
a big-endian Power platform, vector elements are loaded from
|
|
|
memory into a register so that the 0th element occupies the
|
|
|
high-order bits of the register, and the (N – 1)th element
|
|
|
occupies the low-order bits of the register. This is referred to
|
|
|
as big-endian element order. On a little-endian Power platform,
|
|
|
vector elements are loaded from memory such that the 0th element
|
|
|
occupies the low-order bits of the register, and the (N –
|
|
|
1)th element occupies the high-order bits. This is referred to as
|
|
|
little-endian element order.
|
|
|
</para>
|
|
|
|
|
|
<note>
|
|
|
<para>
|
|
|
Much of the information in this chapter was formerly part of
|
|
|
Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power.
|
|
|
</para>
|
|
|
</note>
|
|
|
|
|
|
<section>
|
|
|
<title>Language Elements</title>
|
|
|
<para>
|
|
|
The C and C++ languages are extended to use new identifiers
|
|
|
<code>vector</code>, <code>pixel</code>, <code>bool</code>,
|
|
|
<code>__vector</code>, <code>__pixel</code>, and
|
|
|
<code>__bool</code>. These keywords are used to specify vector
|
|
|
data types (<xref linkend="VIPR.ch-data-types" />). Because
|
|
|
these identifiers may conflict with keywords in more recent
|
|
|
language standards for C and C++, compilers may implement these
|
|
|
in one of two ways.
|
|
|
</para>
|
|
|
<itemizedlist>
|
|
|
<listitem>
|
|
|
<para>
|
|
|
<code>__vector</code>, <code>__pixel</code>,
|
|
|
<code>__bool</code>, and <code>bool</code> are defined as
|
|
|
keywords, with <code>vector</code> and <code>pixel</code> as
|
|
|
predefined macros that expand to <code>__vector</code> and
|
|
|
<code>__pixel</code>, respectively.
|
|
|
</para>
|
|
|
</listitem>
|
|
|
<listitem>
|
|
|
<para>
|
|
|
<code>__vector</code>, <code>__pixel</code>, and
|
|
|
<code>__bool</code> are defined as keywords in all contexts,
|
|
|
while <code>vector</code>, <code>pixel</code>, and
|
|
|
<code>bool</code> are treated as keywords only within the
|
|
|
context of a type declaration.
|
|
|
</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
<para>
|
|
|
As a motivating example, the <emphasis
|
|
|
role="bold">vector</emphasis> token is used as a type in the
|
|
|
C++ Standard Template Library, and hence cannot be used as an
|
|
|
unrestricted keyword, but can be used in the context-sensitive
|
|
|
implementation. For example, <emphasis role="bold">vector
|
|
|
char</emphasis> is distinct from <emphasis
|
|
|
role="bold">std::vector</emphasis> in the context-sensitive
|
|
|
implementation.
|
|
|
</para>
|
|
|
<para>
|
|
|
Vector literals may be specified using a type cast and a set of
|
|
|
literal initializers in parentheses or braces. For example,
|
|
|
</para>
|
|
|
<programlisting>vector int x = (vector int) (4, -1, 3, 6);
|
|
|
vector double g = (vector double) { 3.5, -24.6 };</programlisting>
|
|
|
<para>
|
|
|
Current C compilers do not support literals for
|
|
|
<code>__int128</code> types. When constructing a <code>vector
|
|
|
__int128</code> constant from smaller literals such as
|
|
|
<code>int</code> or <code>long long</code>, you must test for
|
|
|
endianness and reverse the order of the smaller literals for
|
|
|
little-endian mode.
|
|
|
</para>
|
|
|
</section>
|
|
|
|
|
|
<section xml:id="VIPR.ch-data-types">
|
|
|
<title>Vector Data Types</title>
|
|
|
<para>
|
|
|
Languages provide support for the data types in <xref
|
|
|
linkend="VIPR.biendian.vectypes" /> to represent vector data
|
|
|
types stored in vector registers.
|
|
|
</para>
|
|
|
<para>
|
|
|
For the C and C++ programming languages (and related/derived
|
|
|
languages), the "Power SIMD C Types" listed in the leftmost
|
|
|
column of <xref linkend="VIPR.biendian.vectypes" /> may be used
|
|
|
when Power SIMD language extensions are enabled. Either
|
|
|
<code>vector</code> or <code>__vector</code> may be used in the
|
|
|
type name. Note that the ELFv2 ABI for Power also includes a
|
|
|
<code>vector _Float16</code> data type. As of this writing, no
|
|
|
current compilers for Power have implemented such a type. This
|
|
|
document does not include that type or any intrinsics related to
|
|
|
it.
|
|
|
</para>
|
|
|
<para>
|
|
|
For the Fortran language, <xref
|
|
|
linkend="VIPR.biendian.fortran-types" /> gives a correspondence
|
|
|
between Fortran and C/C++ language types.
|
|
|
</para>
|
|
|
<para>
|
|
|
The assignment operator always performs a byte-by-byte data copy
|
|
|
for vector data types.
|
|
|
</para>
|
|
|
<para>
|
|
|
Like other C/C++ language types, vector types may be defined to
|
|
|
have const or volatile properties. Vector data types can be
|
|
|
defined as being in static, auto, and register storage.
|
|
|
</para>
|
|
|
<para>
|
|
|
Pointers to vector types are defined like pointers of other
|
|
|
C/C++ types. Pointers to vector objects may be defined to have
|
|
|
const and volatile properties. Pointers to vector objects must
|
|
|
be addresses divisible by 16, as vector objects are always
|
|
|
aligned on quadword (16-byte, or 128-bit) boundaries.
|
|
|
</para>
|
|
|
<para>
|
|
|
The preferred way to access vectors at an application-defined
|
|
|
address is by using vector pointers and the C/C++ dereference
|
|
|
operator <code>*</code>. Similar to other C/C++ data types, the
|
|
|
array reference operator <code>[]</code> may be used to access
|
|
|
vector objects with a vector pointer with the usual definition
|
|
|
to access the <emphasis>N</emphasis>th vector element from a
|
|
|
vector pointer. The dereference operator <code>*</code> may
|
|
|
<emphasis>not</emphasis> be used to access data that is not
|
|
|
aligned at least to a quadword boundary. Built-in functions
|
|
|
such as <code>vec_xl</code> and <code>vec_xst</code> are
|
|
|
provided for unaligned data access. Please refer to <xref
|
|
|
linkend="VIPR.biendian.unaligned" /> for an example.
|
|
|
</para>
|
|
|
<para>
|
|
|
One vector type may be cast to another vector type without
|
|
|
restriction. Such a cast is simply a reinterpretation of the
|
|
|
bits, and does not change the data.
|
|
|
</para>
|
|
|
<para>
|
|
|
Compilers are expected to recognize and optimize multiple
|
|
|
operations that can be optimized into a single hardware
|
|
|
instruction. For example, a load-and-splat hardware instruction
|
|
|
(such as <emphasis role="bold">lxvdsx</emphasis>)
|
|
|
might be generated for the following sequence:
|
|
|
</para>
|
|
|
<programlisting>double *double_ptr;
|
|
|
register vector double vd = vec_splats(*double_ptr);</programlisting>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes">
|
|
|
<title>Vector Types</title>
|
|
|
<tgroup cols="4">
|
|
|
<colspec colname="c1" colwidth="20*" />
|
|
|
<colspec colname="c2" colwidth="10*" align="center" />
|
|
|
<colspec colname="c3" colwidth="15*" align="center" />
|
|
|
<colspec colname="c4" colwidth="40*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Power SIMD C Types</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">sizeof</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Alignment</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Description</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned char</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 16 unsigned bytes.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed char</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 16 signed bytes.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool char</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 16 bytes with a value of either 0 or
|
|
|
2<superscript>8</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned short</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 unsigned halfwords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed short</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 signed halfwords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool short</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 halfwords with a value of either 0 or
|
|
|
2<superscript>16</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector pixel</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 8 halfwords, each interpreted as a 1-bit
|
|
|
channel and three 5-bit channels.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned int</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 unsigned words.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed int</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 signed words.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool int</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 words with a value of either 0 or
|
|
|
2<superscript>32</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned long<footnote xml:id="vlong">
|
|
|
<para>The vector long types are deprecated due to their
|
|
|
ambiguity between 32-bit and 64-bit environments. The use
|
|
|
of the vector long long types is preferred.</para>
|
|
|
</footnote></para>
|
|
|
<para>vector unsigned long long</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 unsigned doublewords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed long<footnoteref linkend="vlong" /></para>
|
|
|
<para>vector signed long long</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 signed doublewords.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector bool long<footnoteref linkend="vlong" /></para>
|
|
|
<para>vector bool long long</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 doublewords with a value of either 0 or
|
|
|
2<superscript>64</superscript> – 1.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector unsigned __int128</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 1 unsigned quadword.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector signed __int128</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 1 signed quadword.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector float</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 4 single-precision floats.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vector double</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>16</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Quadword</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector of 2 double-precision floats.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Vector Operators</title>
|
|
|
<para>
|
|
|
In addition to the dereference and assignment operators, the
|
|
|
Power Bi-Endian Vector Programming Model provides the usual
|
|
|
operators that are valid on pointers; these operators are also
|
|
|
valid for pointers to vector types.
|
|
|
</para>
|
|
|
<para>
|
|
|
The traditional C/C++ operators are defined on vector types
|
|
|
for unary and binary <code>+</code>,
|
|
|
unary and binary –, binary <code>*</code>, binary
|
|
|
<code>%</code>, and binary <code>/</code> as well as the unary
|
|
|
and binary shift, logical and comparison operators, and the
|
|
|
ternary <code>?:</code> operator. These operators perform their
|
|
|
operations "elementwise" on the base elements of the operands,
|
|
|
as follows.
|
|
|
</para>
|
|
|
<para>
|
|
|
For unary operators, the specified operation is performed on
|
|
|
each base element of the single operand to derive the result
|
|
|
value placed into the corresponding element of the vector
|
|
|
result. The result type of unary operations is the type of the
|
|
|
single operand. For example,
|
|
|
</para>
|
|
|
<programlisting>vector signed int a, b;
|
|
|
a = -b;</programlisting>
|
|
|
<para>
|
|
|
produces the same result as
|
|
|
</para>
|
|
|
<programlisting>vector signed int a, b;
|
|
|
a = vec_neg (b);</programlisting>
|
|
|
<para>
|
|
|
For binary operators, the specified operation is performed on
|
|
|
corresponding base elements of both operands to derive the
|
|
|
result value for each vector element of the vector result. Both
|
|
|
operands of the binary operators must have the same vector type
|
|
|
with the same base element type. The result of binary operators
|
|
|
is the same type as the type of the operands. For example,
|
|
|
</para>
|
|
|
<programlisting>vector signed int a, b;
|
|
|
a = a + b;</programlisting>
|
|
|
<para>
|
|
|
produces the same result as
|
|
|
</para>
|
|
|
<programlisting>vector signed int a, b;
|
|
|
a = vec_add (a, b);</programlisting>
|
|
|
<para>
|
|
|
Further, the array reference operator may be applied to vector
|
|
|
data types, yielding an l-value corresponding to the specified
|
|
|
element in accordance with the vector element numbering rules (see
|
|
|
<xref linkend="VIPR.biendian.layout" />). An l-value may either
|
|
|
be assigned a new value or accessed for reading its value. For
|
|
|
example,
|
|
|
</para>
|
|
|
<programlisting>vector signed int a;
|
|
|
signed int b, c;
|
|
|
b = a[0];
|
|
|
a[3] = c;</programlisting>
|
|
|
</section>
|
|
|
|
|
|
<section xml:id="VIPR.biendian.layout">
|
|
|
<title>Vector Layout and Element Numbering</title>
|
|
|
<para>
|
|
|
Vector data types consist of a homogeneous sequence of elements
|
|
|
of the base data type specified in the vector data
|
|
|
type. Individual elements of a vector can be addressed by a
|
|
|
vector element number. To understand how vector elements are
|
|
|
represented in memory and in registers, it is best to start with
|
|
|
some simple concepts of endianness.
|
|
|
</para>
|
|
|
<figure pgwide="1" xml:id="scalar-endian">
|
|
|
<title>Scalar Quantities and Endianness</title>
|
|
|
<mediaobject>
|
|
|
<imageobject>
|
|
|
<imagedata fileref="Scalar-endian.png" format="PNG"
|
|
|
scalefit="1" width="100%" />
|
|
|
</imageobject>
|
|
|
</mediaobject>
|
|
|
</figure>
|
|
|
<para>
|
|
|
<xref linkend="scalar-endian" /> shows different representations
|
|
|
of a 64-bit scalar integer with the hexadecimal value
|
|
|
<code>0x0123456789ABCDEF</code>. We say that the most
|
|
|
significant byte (MSB) of this value is <code>0x01</code>, and
|
|
|
its least significant byte (LSB) is <code>0xEF</code>. The scalar
|
|
|
value is stored using eight bytes of memory. On a little-endian
|
|
|
(LE) system, the LSB is stored at the lowest address of these
|
|
|
eight bytes, and the MSB is stored at the highest address. On a
|
|
|
big-endian (BE) system, the MSB is stored at the lowest address
|
|
|
of these eight bytes, and the LSB is stored at the highest
|
|
|
address. Regardless of the memory order, the register
|
|
|
representation of the scalar value is identical; the MSB is
|
|
|
located on the "left" end of the register, and the LSB is
|
|
|
located on the "right" end.
|
|
|
</para>
|
|
|
<para>
|
|
|
Of course, the concept of "left" and "right" is a useful
|
|
|
fiction; there is no guarantee that the circuitry of a hardware
|
|
|
register is laid out this way. However, we will see, as we deal
|
|
|
with vector elements, that the concepts of left and right are
|
|
|
more natural for human understanding than byte and element
|
|
|
significance. Indeed, most programming languages have
|
|
|
operators, such as shift-left and shift-right, that use this
|
|
|
same terminology.
|
|
|
</para>
|
|
|
<para>
|
|
|
Let's move from scalars to arrays, which are more interesting to
|
|
|
us since we can use vector registers to operate on arrays, or
|
|
|
portions of larger arrays. Suppose we
|
|
|
have an array of bytes with values 0 through 15, as shown in
|
|
|
<xref linkend="byte-array-endian" />. Note that each byte is a
|
|
|
separate data element with only one possible representation in
|
|
|
memory, so the array of bytes looks identical in memory,
|
|
|
regardless of whether we are using a BE system or an LE system.
|
|
|
But when we load these 16 bytes into a vector register, perhaps
|
|
|
by using the ISA 3.0 <emphasis role="bold">lxv</emphasis>
|
|
|
instruction, the byte at the lowest address on an LE system will
|
|
|
be placed in the LSB of the vector register, but on a BE system
|
|
|
will be placed in the MSB of the vector register. Thus the
|
|
|
array elements appear "right to left" in the register on an LE
|
|
|
system, and "left to right" in the register on a BE system.
|
|
|
</para>
|
|
|
<figure pgwide="1" xml:id="byte-array-endian">
|
|
|
<title>Byte Arrays and Endianness</title>
|
|
|
<mediaobject>
|
|
|
<imageobject>
|
|
|
<imagedata fileref="Byte-array-endian.png" format="PNG"
|
|
|
scalefit="1" width="100%" />
|
|
|
</imageobject>
|
|
|
</mediaobject>
|
|
|
</figure>
|
|
|
<para>
|
|
|
Things become even more interesting when we consider arrays of
|
|
|
larger elements. In <xref linkend="word-array-endian" />, we
|
|
|
see the layout of an array of four 32-bit integers, where the 0th
|
|
|
element has hexadecimal value <code>0x00010203</code>, the 1st
|
|
|
element has value <code>0x04050607</code>, the 2nd element has
|
|
|
value <code>0x08090A0B</code>, and the 3rd element has value
|
|
|
<code>0x0C0D0E0F</code>. The order of the array elements in
|
|
|
memory is the same for both LE and BE systems; but the layout of
|
|
|
each element itself is reversed. When the <emphasis
|
|
|
role="bold">lxv</emphasis> instruction is used to load the
|
|
|
memory into a vector register, again the low address is loaded
|
|
|
into the LSB of the register for LE, but loaded into the MSB of
|
|
|
the register for BE. The effect is that the array elements
|
|
|
again appear right-to-left on a LE system and left-to-right on a
|
|
|
BE system. Note that each 32-bit element of the array has its
|
|
|
most significant bit "on the left" whether a LE or BE system is
|
|
|
in use. This is of course necessary for proper arithmetic to be
|
|
|
performed on the array elements by vector instructions.
|
|
|
</para>
|
|
|
<figure pgwide="1" xml:id="word-array-endian">
|
|
|
<title>Word Arrays and Endianness</title>
|
|
|
<mediaobject>
|
|
|
<imageobject>
|
|
|
<imagedata fileref="Word-array-endian.png" format="PNG"
|
|
|
scalefit="1" width="100%" />
|
|
|
</imageobject>
|
|
|
</mediaobject>
|
|
|
</figure>
|
|
|
|
|
|
<!-- Element numbers can be established either
|
|
|
by counting from the “left” of a register and assigning the
|
|
|
left-most element the element number 0, or from the “right” of
|
|
|
the register and assigning the right-most element the element
|
|
|
number 0.
|
|
|
</para>
|
|
|
-->
|
|
|
<para>
|
|
|
Thus on a BE system, we number vector elements starting with 0
|
|
|
on the left, while on an LE system, we number vector elements
|
|
|
starting with 0 on the right. We will informally refer to these
|
|
|
as big-endian and little-endian vector element numberings and
|
|
|
vector layouts.
|
|
|
</para>
|
|
|
<para>
|
|
|
This element numbering shall also be used by the <code>[]</code>
|
|
|
accessor method to vector elements provided as an extension of
|
|
|
the C/C++ languages by some compilers, as well as for other
|
|
|
language extensions or library constructs that directly or
|
|
|
indirectly refer to elements by their element number.
|
|
|
</para>
|
|
|
<para>
|
|
|
Application programs may query the vector element ordering in
|
|
|
use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
|
|
|
has two possible values:
|
|
|
</para>
|
|
|
<informaltable frame="none" rowsep="0" colsep="0">
|
|
|
<tgroup cols="2">
|
|
|
<colspec colname="c1" colwidth="40*" />
|
|
|
<colspec colname="c2" colwidth="60*" />
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>__ORDER_LITTLE_ENDIAN__</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector elements use little-endian element ordering.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>__ORDER_BIG_ENDIAN__</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Vector elements use big-endian element ordering.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</informaltable>
|
|
|
<para>
|
|
|
This is no longer as useful as it once was. The primary use
|
|
|
case was for big-endian vector layout in little-endian
|
|
|
environments, which is now deprecated as discussed in <xref
|
|
|
linkend="VIPR.biendian.BELE" />. It's generally equivalent to
|
|
|
test for <code>__BIG_ENDIAN__</code> or
|
|
|
<code>__LITTLE_ENDIAN__</code>.
|
|
|
</para>
|
|
|
<note>
|
|
|
<para>
|
|
|
Remember that each element in a vector has the same representation
|
|
|
in both big- and little-endian element orders. That is, an
|
|
|
<code>int</code> is always 32 bits, with the sign bit in the
|
|
|
high-order position. Programmers must be aware of this when
|
|
|
programming with mixed data types, such as an instruction that
|
|
|
multiplies two <code>short</code> elements to produce an
|
|
|
<code>int</code> element. Always access entire elements to
|
|
|
avoid potential endianness issues.
|
|
|
</para>
|
|
|
</note>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Vector Built-In Functions</title>
|
|
|
<para>
|
|
|
Some of the Power SIMD hardware instructions refer, implicitly
|
|
|
or explicitly, to vector element numbers. For example, the
|
|
|
<code>vspltb</code> instruction has as one of its inputs an
|
|
|
index into a vector. The element at that index position is to
|
|
|
be replicated in every element of the output vector. For
|
|
|
another example, <code>vmuleuh</code> instruction operates on
|
|
|
the even-numbered elements of its input vectors. The hardware
|
|
|
instructions define these element numbers using big-endian
|
|
|
element order, even when the machine is running in little-endian
|
|
|
mode. Thus, a built-in function that maps directly to the
|
|
|
underlying hardware instruction, regardless of the target
|
|
|
endianness, has the potential to confuse programmers on
|
|
|
little-endian platforms.
|
|
|
</para>
|
|
|
<para>
|
|
|
It is more useful to define built-in functions that map to these
|
|
|
instructions to use natural element order. That is, the
|
|
|
explicit or implicit element numbers specified by such built-in
|
|
|
functions should be interpreted using big-endian element order
|
|
|
on a big-endian platform, and using little-endian element order
|
|
|
on a little-endian platform.
|
|
|
</para>
|
|
|
<para>
|
|
|
The descriptions of the built-in functions in <xref
|
|
|
linkend="VIPR.vec-ref" /> contain notes on endian issues that
|
|
|
apply to each built-in function. Furthermore, a built-in
|
|
|
function requiring a different compiler implementation for
|
|
|
big-endian than it uses for little-endian has a sample
|
|
|
compiler implementation for both BE and LE. These sample
|
|
|
implementations are only intended as examples; designers of a
|
|
|
compiler are free to use other methods to implement the
|
|
|
specified semantics.
|
|
|
</para>
|
|
|
<section>
|
|
|
<title>Extended Data Movement Functions</title>
|
|
|
<para>
|
|
|
The built-in functions in <xref
|
|
|
linkend="VIPR.biendian.vmx-mem" /> map to Altivec/VMX load and
|
|
|
store instructions and provide access to the “auto-aligning”
|
|
|
memory instructions of the VMX ISA where low-order address
|
|
|
bits are discarded before performing a memory access. These
|
|
|
instructions load and store data in accordance with the
|
|
|
program's current endian mode, and do not need to be adapted
|
|
|
by the compiler to reflect little-endian operation during code
|
|
|
generation.
|
|
|
</para>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
|
|
|
<title>VMX Memory Access Built-In Functions</title>
|
|
|
<tgroup cols="3">
|
|
|
<colspec colname="c1" colwidth="15*" align="center" />
|
|
|
<colspec colname="c2" colwidth="35*" align="center" />
|
|
|
<colspec colname="c3" colwidth="50*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>
|
|
|
<emphasis role="bold">Built-in Function</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>
|
|
|
<emphasis role="bold">Corresponding Power
|
|
|
Instructions</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Implementation Notes</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_ld</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>lvx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_lde</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>lvebx, lvehx, lvewx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_ldl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>lvxl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_st</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>stvx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_ste</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>stvebx, stvehx, stvewx</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>vec_stl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>stvxl</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Hardware works as a function of endian mode.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
<para>
|
|
|
Before the bi-endian programming model was introduced, the
|
|
|
<code>vec_lvsl</code> and <code>vec_lvsr</code> intrinsics
|
|
|
were supported. These could be used in conjunction with
|
|
|
<code>vec_perm</code> and VMX load and store instructions for
|
|
|
unaligned access. The <code>vec_lvsl</code> and
|
|
|
<code>vec_lvsr</code> interfaces are deprecated in accordance
|
|
|
with the interfaces specified here. For compatibility, the
|
|
|
built-in pseudo sequences published in previous VMX documents
|
|
|
continue to work with little-endian data layout and the
|
|
|
little-endian vector layout described in this document.
|
|
|
However, the use of these sequences in new code is discouraged
|
|
|
and usually results in worse performance. It is recommended
|
|
|
that compilers issue a warning when these functions are used
|
|
|
in little-endian environments.
|
|
|
</para>
|
|
|
<para>
|
|
|
Instead, it is recommended that programmers use the
|
|
|
<code>vec_xl</code> and <code>vec_xst</code> vector built-in
|
|
|
functions to access unaligned data streams. See the
|
|
|
descriptions of these instructions in <xref
|
|
|
linkend="VIPR.vec-ref" /> for further description and
|
|
|
implementation details.
|
|
|
</para>
|
|
|
</section>
|
|
|
<section xml:id="VIPR.biendian.BELE">
|
|
|
<title>Big-Endian Vector Layout in Little-Endian Environments
|
|
|
(Deprecated)</title>
|
|
|
<para>
|
|
|
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
|
|
|
for Power provided for optional compiler support for using
|
|
|
big-endian element ordering in little-endian environments.
|
|
|
This was initially deemed useful for porting certain libraries
|
|
|
that assumed big-endian element ordering regardless of the
|
|
|
endianness of their input streams. In practice, this
|
|
|
introduced serious compiler complexity without much utility.
|
|
|
Thus this support (previously controlled by switches
|
|
|
<code>-maltivec=be</code> and/or <code>-qaltivec=be</code>) is
|
|
|
now deprecated. Current versions of the GCC and Clang
|
|
|
open-source compilers do not implement this support.
|
|
|
</para>
|
|
|
</section>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Language-Specific Vector Support for Other
|
|
|
Languages</title>
|
|
|
<section>
|
|
|
<title>Fortran</title>
|
|
|
<para>
|
|
|
<xref linkend="VIPR.biendian.fortran-types" /> shows the
|
|
|
correspondence between the C/C++ types described in this
|
|
|
document and their Fortran equivalents. In Fortran, the
|
|
|
Boolean vector data types are represented by
|
|
|
<code>VECTOR(UNSIGNED(</code><emphasis>n</emphasis><code>))</code>.
|
|
|
</para>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.fortran-types">
|
|
|
<title>Fortran Vector Data Types</title>
|
|
|
<tgroup cols="2">
|
|
|
<colspec colname="c1" colwidth="50*" />
|
|
|
<colspec colname="c2" colwidth="50*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">XL Fortran Vector Type</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">XL C/C++ Vector Type</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(1))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed char</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(2))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed short</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(4))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed int</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(8))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed long long, vector signed long<footnote
|
|
|
xml:id="vlongappalling">
|
|
|
<para>The vector long types are deprecated due to their
|
|
|
ambiguity between 32-bit and 64-bit environments. The use
|
|
|
of the vector long long types is preferred.</para>
|
|
|
</footnote></para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(INTEGER(16))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed __int128</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(1))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned char</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(2))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned short</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(4))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned int</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(8))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned long long, vector unsigned long<footnoteref
|
|
|
linkend="vlongappalling" /></para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(UNSIGNED(16))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned __int128</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(REAL(4))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector float</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(REAL(8))</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector double</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VECTOR(PIXEL)</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector pixel</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
<para>
|
|
|
Because the Fortran language does not support pointers, vector
|
|
|
built-in functions that expect pointers to a base type take an
|
|
|
array element reference to indicate the address of a memory
|
|
|
location that is the subject of a memory access built-in
|
|
|
function.
|
|
|
</para>
|
|
|
<para>
|
|
|
Because the Fortran language does not support type casts, the
|
|
|
<code>vec_convert</code> and <code>vec_concat</code> built-in
|
|
|
functions shown in <xref linkend="VIPR.endian.convert" /> are
|
|
|
provided to perform bit-exact type conversions between vector
|
|
|
types.
|
|
|
</para>
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.endian.convert">
|
|
|
<title>Built-In Vector Conversion Functions</title>
|
|
|
<tgroup cols="2">
|
|
|
<colspec colname="c1" colwidth="30*" align="center" />
|
|
|
<colspec colname="c2" colwidth="70*" />
|
|
|
<thead>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>
|
|
|
<emphasis role="bold">Group</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
<entry align="center">
|
|
|
<para>
|
|
|
<emphasis role="bold">Description</emphasis>
|
|
|
</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VEC_CONCAT (ARG1, ARG2)<?linebreak?>(Fortran)</para>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Purpose:</para>
|
|
|
<para>Concatenates two elements to form a vector.</para>
|
|
|
<para>Result value:</para>
|
|
|
<para>The resulting vector consists of the two scalar elements,
|
|
|
ARG1 and ARG2, assigned to elements 0 and 1 (using the
|
|
|
environment’s native endian numbering), respectively.</para>
|
|
|
<itemizedlist>
|
|
|
<listitem>
|
|
|
<para><emphasis role="bold">Note: </emphasis>This function corresponds to the C/C++ vector
|
|
|
constructor (vector type){a,b}. It is provided only for
|
|
|
languages without vector constructors.</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector signed long long vec_concat (signed long long,
|
|
|
signed long long);</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector unsigned long long vec_concat (unsigned long long,
|
|
|
unsigned long long);</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para></para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>vector double vec_concat (double, double);</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
<row>
|
|
|
<entry>
|
|
|
<para>VEC_CONVERT(V, MOLD)</para>
|
|
|
</entry>
|
|
|
<entry>
|
|
|
<para>Purpose:</para>
|
|
|
<para>Converts a vector to a vector of a given type.</para>
|
|
|
<para>Class:</para>
|
|
|
<para>Pure function</para>
|
|
|
<para>Argument type and attributes:</para>
|
|
|
<itemizedlist spacing="compact">
|
|
|
<listitem>
|
|
|
<para>V Must be an INTENT(IN) vector.</para>
|
|
|
</listitem>
|
|
|
<listitem>
|
|
|
<para>MOLD Must be an INTENT(IN) vector. If it is a
|
|
|
variable, it need not be defined.</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
<para>Result type and attributes:</para>
|
|
|
<para>The result is a vector of the same type as MOLD.</para>
|
|
|
<para>Result value:</para>
|
|
|
<para>The result is as if it were on the left-hand side of an
|
|
|
intrinsic assignment with V on the right-hand side.</para>
|
|
|
</entry>
|
|
|
</row>
|
|
|
</tbody>
|
|
|
</tgroup>
|
|
|
</table>
|
|
|
</section>
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
|
<title>Examples and Limitations</title>
|
|
|
<section xml:id="VIPR.biendian.unaligned">
|
|
|
<title>Unaligned vector access</title>
|
|
|
<para>
|
|
|
A common programming error is to cast a pointer to a base type
|
|
|
(such as <code>int</code>) to a pointer of the corresponding
|
|
|
vector type (such as <code>vector int</code>), and then
|
|
|
dereference the pointer. This constitutes undefined behavior,
|
|
|
because it casts a pointer with a smaller alignment
|
|
|
requirement to a pointer with a larger alignment requirement.
|
|
|
Compilers may not produce code that you expect in the presence
|
|
|
of undefined behavior.
|
|
|
</para>
|
|
|
<para>
|
|
|
Thus, do not write the following:
|
|
|
</para>
|
|
|
<programlisting> int a[4096];
|
|
|
vector int x = *((vector int *) a);</programlisting>
|
|
|
<para>
|
|
|
Instead, write this:
|
|
|
</para>
|
|
|
<programlisting> int a[4096];
|
|
|
vector int x = vec_xl (0, a);</programlisting>
|
|
|
</section>
|
|
|
<section xml:id="VIPR.biendian.sld">
|
|
|
<title>vec_sld and vec_sro are not bi-endian</title>
|
|
|
<para>
|
|
|
One oddity in the bi-endian vector programming model is that
|
|
|
<code>vec_sld</code> has big-endian semantics for code
|
|
|
compiled for both big-endian and little-endian targets. That
|
|
|
is, any code that uses <code>vec_sld</code> without guarding
|
|
|
it with a test on endianness is likely to be incorrect.
|
|
|
</para>
|
|
|
<para>
|
|
|
At the time that the bi-endian model was being developed, it
|
|
|
was discovered that existing code in several Linux packages
|
|
|
was using <code>vec_sld</code> in order to perform multiplies,
|
|
|
or to otherwise shift portions of base elements left. A
|
|
|
straightforward little-endian implementation of
|
|
|
<code>vec_sld</code> would concatenate the two input vectors
|
|
|
in reverse order and shift bytes to the right. This would
|
|
|
only give compatible results for <code>vector char</code>
|
|
|
types. Those using this intrinsic as a cheap multiply, or to
|
|
|
shift bytes within larger elements, would see different
|
|
|
results on little-endian versus big-endian with such an
|
|
|
implementation. Therefore it was decided that
|
|
|
<code>vec_sld</code> would not have a bi-endian
|
|
|
implementation.
|
|
|
</para>
|
|
|
<para>
|
|
|
<code>vec_sro</code> is not bi-endian for similar reasons.
|
|
|
</para>
|
|
|
</section>
|
|
|
<section xml:id="VIPR.biendian.vperm">
|
|
|
<title>Limitations on bi-endianness of vec_perm</title>
|
|
|
<para>
|
|
|
The <code>vec_perm</code> intrinsic is bi-endian, provided
|
|
|
that it is used to reorder entire elements of the input
|
|
|
vectors.
|
|
|
</para>
|
|
|
<para>
|
|
|
To see why this is, let's examine the code generation for
|
|
|
</para>
|
|
|
<programlisting> vector int t;
|
|
|
vector int a = (vector int){0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f};
|
|
|
vector int b = (vector int){0x10111213, 0x14151617, 0x18191a1b, 0x1c1d1e1f};
|
|
|
vector char c = (vector char){0,1,2,3,28,29,30,31,12,13,14,15,20,21,22,23};
|
|
|
t = vec_perm (a, b, c);</programlisting>
|
|
|
<para>
|
|
|
For big endian, a compiler should generate:
|
|
|
</para>
|
|
|
<programlisting> vperm t,a,b,c</programlisting>
|
|
|
<para>
|
|
|
For little endian targeting a POWER8 system, a compiler should
|
|
|
generate:
|
|
|
</para>
|
|
|
<programlisting> vnand d,c,c
|
|
|
vperm t,b,a,d</programlisting>
|
|
|
<para>
|
|
|
For little endian targeting a POWER9 system, a compiler should
|
|
|
generate:
|
|
|
</para>
|
|
|
<programlisting> vpermr t,b,a,c</programlisting>
|
|
|
<para>
|
|
|
Note that the <code>vpermr</code> instruction takes care of
|
|
|
modifying the permute control vector (PCV) <code>c</code> that
|
|
|
was done using the <code>vnand</code> instruction for POWER8.
|
|
|
Because only the bottom 5 bits of each element of the PCV are
|
|
|
read by the hardware, this has the effect of subtracting the
|
|
|
original elements of the PCV from 31.
|
|
|
</para>
|
|
|
<para>
|
|
|
Note also that the PCV <code>c</code> has element values that
|
|
|
are contiguous in groups of 4. This selects entire elements
|
|
|
from the input vectors <code>a</code> and <code>b</code> to
|
|
|
reorder. Thus the intent of the code is to select the first
|
|
|
integer element of <code>a</code>, the last integer element of
|
|
|
<code>b</code>, the last integer element of <code>a</code>,
|
|
|
and the second integer element of <code>b</code>, in that
|
|
|
order.
|
|
|
</para>
|
|
|
<para>
|
|
|
For little endian, the modified PCV is elementwise subtracted
|
|
|
from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
|
|
|
Since the elements appear in reverse order in a register when
|
|
|
loaded from little-endian memory, the elements appear in the
|
|
|
register from left to right as
|
|
|
{8,9,10,11,16,17,18,19,0,1,2,3,28,29,30,31}. So the following
|
|
|
<code>vperm</code> instruction will again select entire
|
|
|
elements using the groups of 4 contiguous bytes, and the
|
|
|
values of the integers will be reordered without compromising
|
|
|
each integer's contents. The fact that the little-endian
|
|
|
result matches the big-endian result is left as an exercise
|
|
|
for the reader.
|
|
|
</para>
|
|
|
<para>
|
|
|
Now, suppose instead that the original PCV does not reorder
|
|
|
entire integers at once:
|
|
|
</para>
|
|
|
<programlisting> vector char c = (vector char){0,20,31,4,7,17,6,19,30,3,2,8,9,13,5,22};</programlisting>
|
|
|
<para>
|
|
|
The result of the big-endian implementation would be:
|
|
|
</para>
|
|
|
<programlisting> t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};</programlisting>
|
|
|
<para>
|
|
|
For little-endian, the modified PCV would be
|
|
|
{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
|
|
|
the register as
|
|
|
{9,26,18,22,23,29,28,1,12,25,14,24,27,0,11,31}. The final
|
|
|
little-endian result would be
|
|
|
</para>
|
|
|
<programlisting> t = {0x071c1703, 0x10051204, 0x0b01001d, 0x15060e0a};</programlisting>
|
|
|
<para>
|
|
|
which bears no resemblance to the big-endian result.
|
|
|
</para>
|
|
|
<para>
|
|
|
The lesson here is to only use <code>vec_perm</code> to
|
|
|
reorder entire elements of a vector. If you must use vec_perm
|
|
|
for another purpose, your code must include a test for
|
|
|
endianness and separate algorithms for big- and
|
|
|
little-endian. Examples of this may be seen in the Power
|
|
|
Vector Library project (see <xref linkend="VIPR.intro.links"
|
|
|
/>).
|
|
|
</para>
|
|
|
</section>
|
|
|
</section>
|
|
|
|
|
|
</chapter>
|