Vector Programming Interfaces
To ensure portability of applications optimized to exploit the SIMD
functions of Power ISA processors, the ELF V2 ABI defines a set of
functions and data types for SIMD programming. ELF V2-compliant compilers
will provide suitable support for these functions, preferably as built-in
functions that translate to one or more Power ISA instructions.
Compilers are encouraged, but not required, to provide built-in
functions to access individual instructions in the IBM POWER® instruction
set architecture. In most cases, each such built-in function should provide
direct access to the underlying instruction.
However, to ease porting between little-endian (LE) and big-endian
(BE) POWER systems, and between POWER and other platforms, it is preferable
that some built-in functions provide the same semantics on both LE and BE
POWER systems, even if this means that the built-in functions are
implemented with different instruction sequences for LE and BE. To achieve
this, vector built-in functions provide a set of functions derived from the
set of hardware functions provided by the Power vector SIMD instructions.
Unlike traditional “hardware intrinsic” built-in functions, no fixed
mapping exists between these built-in functions and the generated hardware
instruction sequence. Rather, the compiler is free to generate optimized
instruction sequences that implement the semantics of the program specified
by the programmer using these built-in functions.
This is primarily applicable to the vector facility of the POWER ISA,
also known as Power SIMD, consisting of the VMX (or Altivec) and VSX
instructions. This set of instructions operates on groups of 2, 4, 8, or 16
vector elements at a time in 128-bit registers. On a big-endian POWER
platform, vector elements are loaded from memory into a register so that
the 0th element occupies the high-order bits of the register, and the
(N – 1)th element occupies the low-order bits of the register. This is
referred to as big-endian element order. On a little-endian POWER platform,
vector elements are loaded from memory such that the 0th element occupies
the low-order bits of the register, and the (N – 1)th element occupies the
high-order bits. This is referred to as little-endian element order.
Vector Data Types
Languages provide support for the data types in
to represent vector data types
stored in vector registers.
For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type names listed
in
when Power ISA SIMD language
extensions are enabled using either the vector or __vector keywords.
For the Fortran language,
gives a correspondence of Fortran
and C/C++ language types.
The assignment operator always performs a byte-by-byte data copy for
vector data types.
Like other C/C++ language types, vector types may be defined to have
const or volatile properties. Vector data types can be defined as being in
static, auto, and register storage.
Pointers to vector types are defined like pointers of other C/C++
types. Pointers to objects may be defined to have const and volatile
properties. While the preferred alignment for vector data types is a
multiple of 16 bytes, pointers may point to vector objects at an arbitrary
alignment.
The preferred way to access vectors at an application-defined address
is by using vector pointers and the C/C++ dereference operator *. Similar
to other C /C++ data types, the array reference operator [ ] may be used to
access vector objects with a vector pointer with the usual definition to
access the n-th vector element from a vector pointer. The use of vector
built-in functions such as vec_xl and vec_xst is discouraged except for
languages where no dereference operators are available.
vector char vca;
vector char vcb;
vector int via;
int a[4];
void *vp;
via = *(vector int *) &a[0];
vca = (vector char) via;
vcb = vca;
vca = *(vector char *)vp;
*(vector char *)&a[0] = vca;
Compilers are expected to recognize and optimize multiple operations
that can be optimized into a single hardware instruction. For example, a
load and splat hardware instruction might be generated for the following
sequence:
double *double_ptr;
register vector double vd = vec_splats(*double_ptr);
Vector Operators
In addition to the dereference and assignment operators, the Power
SIMD Vector Programming API provides the usual operators that are valid on
pointers; these operators are also valid for pointers to vector
types.
The traditional C/C++ operators are defined on vector types with “do
all” semantics for unary and binary +, unary and binary –, binary *, binary
%, and binary / as well as the unary and binary logical and comparison
operators.
For unary operators, the specified operation is performed on the
corresponding base element of the single operand to derive the result value
for each vector element of the vector result. The result type of unary
operations is the type of the single input operand.
For binary operators, the specified operation is performed on the
corresponding base elements of both operands to derive the result value for
each vector element of the vector result. Both operands of the binary
operators must have the same vector type with the same base element type.
The result of binary operators is the same type as the type of the input
operands.
Further, the array reference operator may be applied to vector data
types, yielding an l-value corresponding to the specified element in
accordance with the vector element numbering rules (see
). An l-value may either be
assigned a new value or accessed for reading its value.
Vector Layout and Element Numbering
Vector data types consist of a homogeneous sequence of elements of
the base data type specified in the vector data type. Individual elements
of a vector can be addressed by a vector element number. Element numbers
can be established either by counting from the “left” of a register and
assigning the left-most element the element number 0, or from the “right”
of the register and assigning the right-most element the element number
0.
In big-endian environments, establishing element counts from the left
makes the element stored at the lowest memory address the lowest-numbered
element. Thus, when vectors and arrays of a given base data type are
overlaid, vector element 0 corresponds to array element 0, vector element 1
corresponds to array element 1, and so forth.
In little-endian environments, establishing element counts from the
right makes the element stored at the lowest memory address the
lowest-numbered element. Thus, when vectors and arrays of a given base data
type are overlaid, vector element 0 will correspond to array element 0,
vector element 1 will correspond to array element 1, and so forth.
Consequently, the vector numbering schemes can be described as
big-endian and little-endian vector layouts and vector element numberings.
(The term “endian” comes from the endian debates presented in
Gulliver's Travels by Jonathan Swift.)
For internal consistency, in the ELF V2 ABI, the default vector
layout and vector element ordering in big-endian environments shall be big
endian, and the default vector layout and vector element ordering in
little-endian environments shall be little endian.
This element numbering shall also be used by the [ ] accessor method
to vector elements provided as an extension of the C/C++ languages by some
compilers, as well as for other language extensions or library constructs
that directly or indirectly refer to elements by their element
number.
Application programs may query the vector element ordering in use
(that is, whether -qaltivec=be or -maltivec=be has been selected) by
testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro has two possible
values:
__ORDER_LITTLE_ENDIAN__
Vector elements use little-endian element ordering.
__ORDER_BIG_ENDIAN__
Vector elements use big-endian element ordering.
Vector Built-in Functions
The Power language environments provide a well-known set of built-in
functions for the Power SIMD instructions (including both Altivec/VMX and
VSX). A full description of these built-in functions is beyond the scope of
this ABI document. Most built-in functions are polymorphic, operating on a
variety of vector types (vectors of signed characters, vectors of unsigned
halfwords, and so forth).
Some of the Power SIMD (VMX/Altivec and/or VSX) hardware instructions
refer, implicitly or explicitly, to vector element numbers. For example,
the vspltb instruction has as one of its inputs an index into a vector. The
element at that index position is to be replicated in every element of the
output vector. For another example, the vmuleuh instruction operates on the
even-numbered elements of its input vectors. The hardware instructions
define these element numbers using big-endian element order, even when the
machine is running in little-endian mode. Thus, a built-in function that
maps directly to the underlying hardware instruction, regardless of the
target endianness, has the potential to confuse programmers on
little-endian platforms.
It is more useful to define built-in functions that map to these
instructions to use natural element order. That is, the explicit or
implicit element numbers specified by such built-in functions should be
interpreted using big-endian element order on a big-endian platform, and
using little-endian element order on a little-endian platform.
This ABI defines the following built-in functions to use natural
element order. The Implementation Notes column suggests possible ways to
implement little-endian (LE) versions of the built-in functions, although
designers of a compiler are free to use other methods to implement the
specified semantics as they see fit.
Endian-Sensitive Operations
Built-In Function
Corresponding POWER
Instructions
Implementation Notes
vec_bperm
For LE unsigned long long ARGs, swap halves of ARG2 and of
the result.
vec_cntlz_lsbb
For LE, use vctzlsbb.
vec_cnttz_lsbb
For LE, use vclzlsbb.
vec_extract
None
vec_extract (v, 3) is equivalent to v[3].
vec_extract_fp32_from_shorth
For LE, extract the left four elements.
vec_extract_fp32_from_shortl
For LE, extract the right four elements.
vec_extract4b
For LE, subtract the byte position from 12, and swap the
halves of the result.
vec_first_match_index
For LE, use vctz.
vec_first_match_index_or_eos
For LE, use vctz.
vec_insert
None
vec_insert (x, v, 3) returns the vector v with the
third element modified to contain x.
vec_insert4b
For LE, subtract the byte position from 12, and swap the
halves of ARG1.
vec_mergee
vmrgew
Swap inputs and use vmrgow for LE. Phased in.
This optional function is being phased in, and it may not
be available on all implementations.
vec_mergeh
vmrghb, vmrghh, vmrghw
Swap inputs and use vmrglb, and so on, for LE.
vec_mergel
vmrglb, vmrglh, vmrglw
Swap inputs and use vmrghb, and so on, for LE.
vec_mergeo
vmrgow
Swap inputs and use vmrgew for LE. Phased in.
vec_mule
vmuleub, vmulesb, vmuleuh, vmulesh
Replace with vmuloub, and so on, for LE.
vec_mulo
vmuloub, vmulosb, vmulouh, vmulosh
Replace with vmuleub, and so on, for LE.
vec_pack
vpkuhum, vpkuwum
Swap input arguments for LE.
vec_packpx
vpkpx
Swap input arguments for LE.
vec_packs
vpkuhus, vpkshss, vpkuwus, vpkswss
Swap input arguments for LE.
vec_packsu
vpkuhus, vpkshus, vpkuwus, vpkswus
Swap input arguments for LE.
vec_perm
vperm
For LE, swap input arguments and complement the selection
vector.
vec_splat
vspltb, vsplth, vspltw
Subtract the element number from N – 1 for LE.
vec_sum2s
vsum2sws
For LE, swap elements 0 and 1, and elements 2 and 3, of the
second input argument; then swap elements 0 and 1, and elements 2
and 3, of the result vector.
vec_sums
vsumsws
For LE, use element 3 in little-endian order from the
second input vector, and place the result in element 3 in
little-endian order of the result vector.
vec_unpackh
vupkhsb, vupkhpx, vupkhsh
Use vupklsb, and so on, for LE.
vec_unpackl
vupklsb, vupklpx, vupklsh
Use vupkhsb, and so on, for LE.
vec_xl_len_r
For LE, the bytes are loaded left justified then shifted
right 16 – cnt bytes or rotated left cnt bytes. Let “cnt” be the
number of bytes specified to be loaded by vec_xl_len_r.
vec_xst_len_r
For LE, the bytes are shifted left 16 – cnt bytes or rotated
right cnt bytes so they are left justified to be stored. Let
“cnt” be the number of bytes specified to be stored by
vec_xst_len_r.
Reminder: The assignment operator = is the
preferred way to assign values from one vector data type to
another vector data type in accordance with the C and C++
programming languages.
Extended Data Movement Functions
The built-in functions in
map to Altivec/VMX load and
store instructions and provide access to the “auto-aligning” memory
instructions of the Altivec ISA where low-order address bits are
discarded before performing a memory access. These instructions access
load and store data in accordance with the program's current endian mode,
and do not need to be adapted by the compiler to reflect little-endian
operating during code generation:
Altivec Memory Access Built-In Functions
Built-in Function
Corresponding POWER
Instructions
Implementation Notes
vec_ld
lvx
Hardware works as a function of endian mode.
vec_lde
lvebx, lvehx, lvewx
Hardware works as a function of endian mode.
vec_ldl
lvxl
Hardware works as a function of endian mode.
vec_st
stvx
Hardware works as a function of endian mode.
vec_ste
stvebx, stvehx, stvewx
Hardware works as a function of endian mode.
vec_stl
stvxl
Hardware works as a function of endian mode.
Previous versions of the Altivec built-in functions defined
intrinsics to access the Altivec instructions lvsl and lvsr, which could
be used in conjunction with vec_vperm and Altivec load and store
instructions for unaligned access. The vec_lvsl and vec_lvsr interfaces
are deprecated in accordance with the interfaces specified here. For
compatibility, the built-in pseudo sequences published in previous VMX
documents continue to work with little-endian data layout and the
little-endian vector layout described in this document. However, the use
of these sequences in new code is discouraged and usually results in
worse performance. It is recommended (but not required) that compilers
issue a warning when these functions are used in little-endian
environments. It is recommended that programmers use the assignment
operator = or the vector vec_xl and vec_xst vector built-in functions to
access unaligned data streams.
The set of extended mnemonics in
may be provided by some
compilers and are not required by the Power SIMD programming interfaces.
In particular, the assignment operator = will have the same effect of
copying values between vector data types and provides a preferable method
to assign values while giving the compiler more freedom to optimize data
allocation. The only use for these functions is to support some coding
patterns enabling big-endian vector layout code sequences in both
big-endian and little-endian environments. Memory access built-in
functions that specify a vector element format (that is, the w4 and d2
forms) are deprecated. They will be phased out in future versions of this
specification because vec_xl and vec_xst provide overloaded
layout-specific memory access based on the specified vector data
type.
Optional Built-In Memory Access Functions
Built-in Function
Corresponding POWER
Instructions
Little-Endian Implementation
Notes
vec_xl
lxvd2x
lxvd2x ; xxpermdi
vec_xlw4
Deprecated. The use of vector data type
assignment and overloaded vec_xl and vec_xst vector
built-in functions are preferred forms for assigning
vector operations. Similarly, the use of
__builtin_lxvd2x, __builtin_lxvw4x,
__builtin_stxvd2x, __builtin_stxvw4x,
available in some compilers, is discouraged.
lxvw4x
lxvd2x ; xxpermdi
vec_xld2
lxvd2x
lxvd2x ; xxpermdi
vec_xst
stxvd2x
xxpermdi ; stxvd2x
vec_xstw4
stxvw4x
xxpermdi ; stxvd2x
vec_xstd2
stxvd2x
xxpermdi ; stxvd2x
The two optional built-in vector functions in
can be used to load and store
vectors with a big-endian element ordering (that is, bytes from low to
high memory will be loaded from left to right into a vector char
variable), independent of the -qaltivec=be or -maltivec=be setting. For
more information, see
.
Optional Fixed Data Layout Built-In Vector Functions
Built-in Function
Corresponding POWER
Instructions
Little-Endian Implementation
Notes
vec_xl_be
lxvd2x
Use lxvd2x for vector long long; vector long, vector
double.
Use lxvd2x followed by reversal of elements within each
doubleword for all other data types.
vec_xst_be
stxvd2x
Use stxvd2x for vector long long; vector long, vector
double.
Use stxvd2x following a reversal of elements within each
doubleword for all other data types.
In addition to the hardware-specific vector built-in functions,
implementations are expected to provide the interfaces listed in
.
Built-In Interfaces for Inserting and Extracting Elements from a
Vector
Built-In Function
Implementation Notes
vec_extract
vec_extract (v, 3) is equivalent to v[3].
vec_insert
vec_insert (x, v, 3) returns the vector v with the
third element modified to contain x.
Environments may provide the optional built-in vector functions
listed in
to adjust for endian behavior
by reversing the order of elements (reve) and bytes within elements
(revb).
Optional Built-In Functions
Name
Description
vec_revb
Reverses the order of bytes within elements.
vec_reve
Reverses the order of elements.
Big-Endian Vector Layout in Little-Endian Environments
Because the vector layout and element numbering cannot be
represented in source code in an endian-neutral manner, code originating
from big-endian platforms may need to be compiled on little-endian
platforms, or vice versa. To simplify such application porting, some
compilers may provide an additional bridge mode to enable a simplified
porting for some applications.
Note that such support only works for homogeneous data being loaded
into vector registers (that is, no unions or structs containing elements
of different sizes) and when those vectors are loaded from and stored to
memory with element-size-specific built-in vector memory functions of
and
. That is because, in this
mode, data within each element must be adjusted for little-endian data
representation while providing a big-endian layout and numbering of
vector elements within a vector.
Because of the internal contradiction of big-endian
vector layouts and little-endian data, such an environment will have
intrinsic limitations for the type of functionality that may be
offered. However, it may provide a useful bridge in the porting of
code using vector built-ins between environments having different
data layout models.
Compiler designers may implement additional built-in functions or
other mechanisms that use big-endian element ordering in little-endian
mode. For example, the GCC and IBM XL compilers define the options
-maltivec=be and -qaltivec=be, respectively, to allow programmers to
specify that the built-ins will generate big-endian hardware instructions
directly for the corresponding big-endian sequences in little-endian
mode. To ensure consistent element operation in this mode, the lvx
instructions and related instructions are changed to maintain a
big-endian data layout in registers by adding appropriate permute
sequences as shown in
. The selected vector element
order is reflected in the __VEC_ELEMENT_REG_ORDER__ macro. See
.
Altivec Built-In Vector Memory Access Functions (BE Layout in LE
Mode)
Built-In Function
Corresponding POWER
Instructions
BE Vector Layout in Little-Endian Mode
Implementation Notes
vec_ld
lvx
Reverse elements with a vperm after load for LE based on
vector base type.
vec_lde
lvebx, lvehx, lvewx
Reverse elements with a vperm after load for LE based on
vector base type.
vec_ldl
lvxl
Reverse elements with a vperm after load for LE based on
vector base type.
vec_st
stvx
Reverse elements with a vperm before store for LE based
on vector base type.
vec_ste
stvebx, stvehx, stvewx
Reverse elements with a vperm before store for LE based
on vector base type.
vec_stl
stvxl
Reverse elements with a vperm before store for LE based
on vector base type.
Access to memory instructions handling potentially unaligned
accesses may be accomplished by using instructions (or instruction
sequences) that perform little-endian load of the underlying vector data
type while maintaining big-endian element ordering. See
.
Optional Built-In Memory Access Functions (BE Layout in LE
Mode)
Built-In Function
Corresponding POWER
Instructions
BE Vector Layout in Little-Endian Mode
Implementation Notes
vec_xl
lxvd2x
Use lxvd2x for vector long long; vector long, vector
double.
vec_xlw4
Deprecated. The use of vector data type
assignment and overloaded vec_xl and vec_xst vector
built-in functions are preferred forms for assigning
vector operations. Similarly, the use of
__builtin_lxvd2x, __builtin_lxvw4x,
__builtin_stxvd2x, __builtin_stxvw4x,
available in some compilers, is discouraged.
lxvw4x
Use lxvw4x for vector int; vector float.
vec_xld2
lxvd2x
Use lxvd2x, followed by reversal of elements within each
doubleword, for all other data types.
vec_xst
stxvd2x
Use stxvd2x for vector long long; vector long, vector
double.
vec_xstw4
stxvw4x
Use stxvw4x for vector int; vector float.
vec_xstd2
stxvd2x
Use stxvd2x, following a reversal of elements within each
doubleword, for all other data types.
The use of -maltivec=be or -qaltivec=be in
little-endian mode disables the transformations described
in
.
The operation of the assignment operator is never changed by a
setting such as -qaltivec=be or -maltivec=be.
Language-Specific Vector Support for Other Languages
Fortran
shows the correspondence
between the C/C++ types described in this document and their Fortran
equivalents. In Fortran, the Boolean vector data types are represented by
VECTOR(UNSIGNED(n)).
Because the Fortran language does not support pointers, vector
built-in functions that expect pointers to a base type take an array
element reference to indicate the address of a memory location that is
the subject of a memory access built-in function.
Because the Fortran language does not support type casts, the
vec_convert and vec_concat built-in functions shown in
are provided to perform
bit-exact type conversions between vector types.
Built-In Vector Conversion Function
Group
Description
VEC_CONCAT (ARG1, ARG2)(Fortran)
Purpose:
Concatenates two elements to form a vector.
Result value:
The resulting vector consists of the two scalar elements,
ARG1 and ARG2, assigned to elements 0 and 1 (using the
environment’s native endian numbering), respectively.
Note: This function corresponds to the C/C++ vector
constructor (vector type){a,b}. It is provided only for
languages without vector constructors.
vector signed long long vec_concat (signed long long,
signed long long);
vector unsigned long long vec_concat (unsigned long long,
unsigned long long);
vector double vec_concat (double, double);
VEC_CONVERT(V, MOLD)
Purpose:
Converts a vector to a vector of a given type.
Class:
Pure function
Argument type and attributes:
V Must be an INTENT(IN) vector.
MOLD Must be an INTENT(IN) vector. If it is a
variable, it need not be defined.
Result type and attributes:
The result is a vector of the same type as MOLD.
Result value:
The result is as if it were on the left-hand side of an
intrinsic assignment with V on the right-hand side.
gives a correspondence of
Fortran and C/C++ language types.
Fortran Vector Data Types
XL Fortran Vector Type
XL C/C++ Vector Type
VECTOR(INTEGER(1))
vector signed char
VECTOR(INTEGER(2))
vector signed short
VECTOR(INTEGER(4))
vector signed int
VECTOR(INTEGER(8))
vector signed long long, vector signed long
VECTOR(INTEGER(16))
vector signed __int128
VECTOR(UNSIGNED(1))
vector unsigned char
VECTOR(UNSIGNED(2))
vector unsigned short
VECTOR(UNSIGNED(4))
vector unsigned int
VECTOR(UNSIGNED(8))
vector unsigned long long, vector unsigned long
VECTOR(UNSIGNED(16))
vector unsigned __int128
VECTOR(REAL(4))
vector float
VECTOR(REAL(8))
vector double
VECTOR(PIXEL)
vector pixel
Library Interfaces
printf and scanf of Vector Data Types
Support for vector variable input and output
may be provided as an extension to the following
POSIX library functions for the new vector conversion format
strings:
scanf
fscanf
sscanf
wsscanf
printf
fprintf
sprintf
snprintf
wsprintf
vprintf
vfprintf
vsprintf
vwsprintf
(One sample implementation for such an extended specification is
libvecprintf.)
The size formatters are as follows:
vl or lv consumes one argument and modifies an existing integer
conversion, resulting in vector signed int, vector unsigned int, or
vector bool for output conversions or vector signed int * or vector
unsigned int * for input conversions. The data is then treated as a
series of four 4-byte components, with the subsequent conversion
format applied to each.
vh or hv consumes one argument and modifies an existing short
integer conversion, resulting in vector signed short or vector
unsigned short for output conversions or vector signed short * or
vector unsigned short * for input conversions. The data is treated as
a series of eight 2-byte components, with the subsequent conversion
format applied to each.
v consumes one argument and modifies a 1-byte integer, 1-byte
character, or 4-byte floating-point conversion. If the conversion is
a floating-point conversion, the result is vector float for output
conversion or vector float * for input conversion. The data is
treated as a series of four 4-byte floating-point components with the
subsequent conversion format applied to each. If the conversion is an
integer or character conversion, the result is either vector signed
char, vector unsigned char, or vector bool char for output
conversion, or vector signed char * or vector unsigned char * for
input conversions. The data is treated as a series of sixteen 1-byte
components, with the subsequent conversion format applied to
each.
vv consumes one argument and modifies an 8-byte floating-point
conversion. If the conversion is a floating-point conversion, the
result is vector double for output conversion or vector double * for
input conversion. The data is treated as a series of two 8-byte
floating-point components with the subsequent conversion format
applied to each. Integer and byte conversions are not defined for the
vv modifier.
As new vector types are defined, new format codes should
be defined to support scanf and printf of those types.
Any conversion format that can be applied to the singular form of a
vector-data type can be used with a vector form. The %d, %x, %X, %u, %i,
and %o integer conversions can be applied with the %lv, %vl, %hv, %vh,
and %v vector-length qualifiers. The %c character conversion can be
applied with the %v vector length qualifier. The %a, %A, %e, %E, %f, %F,
%g, and %G float conversions can be applied with the %v vector length
qualifier.
For input conversions, an optional separator character can be
specified excluding white space preceding the separator. If no separator
is specified, the default separator is a space including white space
characters preceding the separator, unless the conversion is c. Then, the
default conversion is null.
For output conversions, an optional separator character can be
specified immediately preceding the vector size conversion. If no
separator is specified, the default separator is a space unless the
conversion is c. Then, the default separator is null.