Vector Programming Interfaces
Earlier versions of this ABI included a description of vector
programming interfaces and techniques for POWER®, along with an
appendix enumerating the supported vector built-in functions.
Most of this information is not ABI, and is removed from this
version of the document. Instead, those interested are encouraged
to now refer to the POWER Vector
Intrinsics Programming Reference,
available from the OpenPOWER Foundation in their Technical
Resources Catalog ().
To ensure portability of applications
optimized to exploit the SIMD
functions of Power ISA processors, the ELF V2 ABI defines a set of
functions and data types for SIMD programming. ELF V2-compliant compilers
will provide suitable support for these functions, preferably as built-in
functions that translate to one or more Power ISA instructions.Compilers are encouraged, but not
required, to provide built-in
functions to access individual instructions in the IBM POWER® instruction
set architecture. In most cases, each such built-in function should provide
direct access to the underlying instruction.However, to ease porting between
little-endian (LE) and big-endian
(BE) POWER systems, and between POWER and other platforms, it is preferable
that some built-in functions provide the same semantics on both LE and BE
POWER systems, even if this means that the built-in functions are
implemented with different instruction sequences for LE and BE. To achieve
this, vector built-in functions provide a set of functions derived from the
set of hardware functions provided by the Power vector SIMD instructions.
Unlike traditional “hardware intrinsic” built-in functions, no fixed
mapping exists between these built-in functions and the generated hardware
instruction sequence. Rather, the compiler is free to generate optimized
instruction sequences that implement the semantics of the program specified
by the programmer using these built-in functions.This is primarily applicable to the
vector facility of the POWER ISA,
also known as Power SIMD, consisting of the VMX (or Altivec) and VSX
instructions. This set of instructions operates on groups of 2, 4, 8, or 16
vector elements at a time in 128-bit registers. On a big-endian POWER
platform, vector elements are loaded from memory into a register so that
the 0th element occupies the high-order bits of the register, and the
(N – 1)th element occupies the low-order bits of the register. This is
referred to as big-endian element order. On a little-endian POWER platform,
vector elements are loaded from memory such that the 0th element occupies
the low-order bits of the register, and the (N – 1)th element occupies the
high-order bits. This is referred to as little-endian element order.Vector Data TypesLanguages provide support for the data types in
to represent vector data types
stored in vector registers.For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type names listed
in
when Power ISA SIMD language
extensions are enabled using either the vector or __vector keywords.For the Fortran language,
gives a correspondence of Fortran
and C/C++ language types.The assignment operator always performs a byte-by-byte data copy for
vector data types.Like other C/C++ language types, vector types may be defined to have
const or volatile properties. Vector data types can be defined as being in
static, auto, and register storage.Pointers to vector types are defined like pointers of other C/C++
types. Pointers to objects may be defined to have const and volatile
properties.The preferred way to access vectors at an application-defined address
is by using vector pointers and the C/C++ dereference operator *. Similar
to other C /C++ data types, the array reference operator [ ] may be used to
access vector objects with a vector pointer with the usual definition to
access the n-th vector element from a vector pointer. The dereference
operator * may not be used to access data that is
not aligned at least to a quadword boundary. Built-in functions such as
vec_xl and vec_xst are provided for unaligned data access.Compilers are expected to recognize and optimize multiple operations
that can be optimized into a single hardware instruction. For example, a
load and splat hardware instruction might be generated for the following
sequence:double *double_ptr;
register vector double vd = vec_splats(*double_ptr);Vector OperatorsIn addition to the dereference and assignment operators, the Power
SIMD Vector Programming API provides the usual operators that are valid on
pointers; these operators are also valid for pointers to vector
types.The traditional C/C++ operators are defined on vector types with “do
all” semantics for unary and binary +, unary and binary –, binary *, binary
%, and binary / as well as the unary and binary shift, logical and
comparison operators, and the ternary ?: operator.For unary operators, the specified operation is performed on the
corresponding base element of the single operand to derive the result value
for each vector element of the vector result. The result type of unary
operations is the type of the single input operand.For binary operators, the specified operation is performed on the
corresponding base elements of both operands to derive the result value for
each vector element of the vector result. Both operands of the binary
operators must have the same vector type with the same base element type.
The result of binary operators is the same type as the type of the input
operands.Further, the array reference operator may be applied to vector data
types, yielding an l-value corresponding to the specified element in
accordance with the vector element numbering rules (see
). An l-value may either be
assigned a new value or accessed for reading its value.Vector Layout and Element NumberingVector data types consist of a homogeneous sequence of elements of
the base data type specified in the vector data type. Individual elements
of a vector can be addressed by a vector element number. Element numbers
can be established either by counting from the “left” of a register and
assigning the left-most element the element number 0, or from the “right”
of the register and assigning the right-most element the element number
0.In big-endian environments, establishing element counts from the left
makes the element stored at the lowest memory address the lowest-numbered
element. Thus, when vectors and arrays of a given base data type are
overlaid, vector element 0 corresponds to array element 0, vector element 1
corresponds to array element 1, and so forth.In little-endian environments, establishing element counts from the
right makes the element stored at the lowest memory address the
lowest-numbered element. Thus, when vectors and arrays of a given base data
type are overlaid, vector element 0 will correspond to array element 0,
vector element 1 will correspond to array element 1, and so forth.Consequently, the vector numbering schemes can be described as
big-endian and little-endian vector layouts and vector element numberings.
(The term “endian” comes from the endian debates presented in
Gulliver's Travels by Jonathan Swift.)For internal consistency, in the ELF V2 ABI, the default vector
layout and vector element ordering in big-endian environments shall be big
endian, and the default vector layout and vector element ordering in
little-endian environments shall be little endian.This element numbering shall also be used by the [ ] accessor method
to vector elements provided as an extension of the C/C++ languages by some
compilers, as well as for other language extensions or library constructs
that directly or indirectly refer to elements by their element
number.Application programs may query the vector element ordering in use
(that is, whether -qaltivec=be or -maltivec=be has been selected) by
testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro has two possible
values:__ORDER_LITTLE_ENDIAN__Vector elements use little-endian element ordering.__ORDER_BIG_ENDIAN__Vector elements use big-endian element ordering.Vector Built-in FunctionsThe Power language environments provide a well-known set of built-in
functions for the Power SIMD instructions (including both Altivec/VMX and
VSX). A full description of these built-in functions is beyond the scope of
this ABI document. Most built-in functions are polymorphic, operating on a
variety of vector types (vectors of signed characters, vectors of unsigned
halfwords, and so forth).Some of the Power SIMD (VMX/Altivec and/or VSX) hardware instructions
refer, implicitly or explicitly, to vector element numbers. For example,
the vspltb instruction has as one of its inputs an index into a vector. The
element at that index position is to be replicated in every element of the
output vector. For another example, the vmuleuh instruction operates on the
even-numbered elements of its input vectors. The hardware instructions
define these element numbers using big-endian element order, even when the
machine is running in little-endian mode. Thus, a built-in function that
maps directly to the underlying hardware instruction, regardless of the
target endianness, has the potential to confuse programmers on
little-endian platforms.It is more useful to define built-in functions that map to these
instructions to use natural element order. That is, the explicit or
implicit element numbers specified by such built-in functions should be
interpreted using big-endian element order on a big-endian platform, and
using little-endian element order on a little-endian platform.This ABI defines the following built-in functions to use natural
element order. The Implementation Notes column suggests possible ways to
implement little-endian (LE) versions of the built-in functions, although
designers of a compiler are free to use other methods to implement the
specified semantics as they see fit.
Endian-Sensitive OperationsBuilt-In FunctionCorresponding POWER
InstructionsImplementation Notesvec_bpermFor LE unsigned long long ARGs, swap halves of ARG2 and of
the result.vec_cntlz_lsbbFor LE, use vctzlsbb.vec_cnttz_lsbbFor LE, use vclzlsbb.vec_extractNonevec_extract (v, 3) is equivalent to v[3].vec_extract_fp32_from_shorthFor LE, extract the left four elements.vec_extract_fp32_from_shortlFor LE, extract the right four elements.vec_extract4bFor LE, subtract the byte position from 12, and swap the
halves of the result.vec_first_match_indexFor LE, use vctz.vec_first_match_index_or_eosFor LE, use vctz.vec_insertNonevec_insert (x, v, 3) returns the vector v with the
third element modified to contain x.vec_insert4bFor LE, subtract the byte position from 12, and swap the
halves of ARG1.vec_mergeevmrgewSwap inputs and use vmrgow for LE. Phased in.This optional function is being phased in, and it may not
be available on all implementations.vec_mergehvmrghb, vmrghh, vmrghwSwap inputs and use vmrglb, and so on, for LE.vec_mergelvmrglb, vmrglh, vmrglwSwap inputs and use vmrghb, and so on, for LE.vec_mergeovmrgowSwap inputs and use vmrgew for LE. Phased in.vec_mulevmuleub, vmulesb, vmuleuh, vmuleshReplace with vmuloub, and so on, for LE.vec_mulovmuloub, vmulosb, vmulouh, vmuloshReplace with vmuleub, and so on, for LE.vec_packvpkuhum, vpkuwumSwap input arguments for LE.vec_packpxvpkpxSwap input arguments for LE.vec_packsvpkuhus, vpkshss, vpkuwus, vpkswssSwap input arguments for LE.vec_packsuvpkuhus, vpkshus, vpkuwus, vpkswusSwap input arguments for LE.vec_permvpermFor LE, swap input arguments and complement the selection
vector.vec_splatvspltb, vsplth, vspltwSubtract the element number from N – 1 for LE.vec_sum2svsum2swsFor LE, swap elements 0 and 1, and elements 2 and 3, of the
second input argument; then swap elements 0 and 1, and elements 2
and 3, of the result vector.vec_sumsvsumswsFor LE, use element 3 in little-endian order from the
second input vector, and place the result in element 3 in
little-endian order of the result vector.vec_unpackhvupkhsb, vupkhpx, vupkhshUse vupklsb, and so on, for LE.vec_unpacklvupklsb, vupklpx, vupklshUse vupkhsb, and so on, for LE.vec_xl_len_rFor LE, the bytes are loaded left justified then shifted
right 16 – cnt bytes or rotated left cnt bytes. Let “cnt” be the
number of bytes specified to be loaded by vec_xl_len_r.vec_xst_len_rFor LE, the bytes are shifted left 16 – cnt bytes or rotated
right cnt bytes so they are left justified to be stored. Let
“cnt” be the number of bytes specified to be stored by
vec_xst_len_r.
Extended Data Movement FunctionsThe built-in functions in
map to Altivec/VMX load and
store instructions and provide access to the “auto-aligning” memory
instructions of the Altivec ISA where low-order address bits are
discarded before performing a memory access. These instructions access
load and store data in accordance with the program's current endian mode,
and do not need to be adapted by the compiler to reflect little-endian
operating during code generation:
Altivec Memory Access Built-In FunctionsBuilt-in FunctionCorresponding POWER
InstructionsImplementation Notesvec_ldlvxHardware works as a function of endian mode.vec_ldelvebx, lvehx, lvewxHardware works as a function of endian mode.vec_ldllvxlHardware works as a function of endian mode.vec_ststvxHardware works as a function of endian mode.vec_stestvebx, stvehx, stvewxHardware works as a function of endian mode.vec_stlstvxlHardware works as a function of endian mode.
Previous versions of the Altivec built-in functions defined
intrinsics to access the Altivec instructions lvsl and lvsr, which could
be used in conjunction with vec_vperm and Altivec load and store
instructions for unaligned access. The vec_lvsl and vec_lvsr interfaces
are deprecated in accordance with the interfaces specified here. For
compatibility, the built-in pseudo sequences published in previous VMX
documents continue to work with little-endian data layout and the
little-endian vector layout described in this document. However, the use
of these sequences in new code is discouraged and usually results in
worse performance. It is recommended (but not required) that compilers
issue a warning when these functions are used in little-endian
environments. It is recommended that programmers use the vec_xl and
vec_xst vector built-in functions to access unaligned data
streams.The built-in functions in
provide unaligned access to
data in memory that is to be copied to or from a variable having vector
data type. Memory access built-in
functions that specify a vector element format (that is, the w4 and d2
forms) are deprecated. They will be phased out in future versions of this
specification because vec_xl and vec_xst provide overloaded
layout-specific memory access based on the specified vector data
type.
VSX Memory Access Built-In FunctionsBuilt-in FunctionCorresponding POWER
InstructionsLittle-Endian Implementation
Notesvec_xllxvd2xlxvd2x ; xxpermdivec_xlw4
Deprecated. The use of vector data type
assignment and overloaded vec_xl and vec_xst vector
built-in functions are preferred forms for assigning
vector operations. Similarly, the use of
__builtin_lxvd2x, __builtin_lxvw4x,
__builtin_stxvd2x, __builtin_stxvw4x,
available in some compilers, is discouraged.lxvw4xlxvd2x ; xxpermdivec_xld2lxvd2xlxvd2x ; xxpermdivec_xststxvd2xxxpermdi ; stxvd2xvec_xstw4stxvw4xxxpermdi ; stxvd2xvec_xstd2stxvd2xxxpermdi ; stxvd2x
The two optional built-in vector functions in
can be used to load and store
vectors with a big-endian element ordering (that is, bytes from low to
high memory will be loaded from left to right into a vector char
variable), independent of the -qaltivec=be or -maltivec=be setting. For
more information, see
.
Optional Fixed Data Layout Built-In Vector FunctionsBuilt-in FunctionCorresponding POWER
InstructionsLittle-Endian Implementation
Notesvec_xl_belxvd2xUse lxvd2x for vector long long; vector long,The vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred. vector double.Use lxvd2x followed by reversal of elements within each
doubleword for all other data types.vec_xst_bestxvd2xUse stxvd2x for vector long long; vector long, vector double.Use stxvd2x following a reversal of elements within each
doubleword for all other data types.
In addition to the hardware-specific vector built-in functions,
implementations are expected to provide the interfaces listed in
.
Built-In Interfaces for Inserting and Extracting Elements from a
VectorBuilt-In FunctionImplementation Notesvec_extractvec_extract (v, 3) is equivalent to v[3].vec_insertvec_insert (x, v, 3) returns the vector v with the
third element modified to contain x.
Environments may provide the optional built-in vector functions
listed in
to adjust for endian behavior
by reversing the order of elements (reve) and bytes within elements
(revb).
Optional Built-In FunctionsNameDescriptionvec_revbReverses the order of bytes within elements.vec_reveReverses the order of elements.
Big-Endian Vector Layout in Little-Endian EnvironmentsBecause the vector layout and element numbering cannot be
represented in source code in an endian-neutral manner, code originating
from big-endian platforms may need to be compiled on little-endian
platforms, or vice versa. To simplify such application porting, some
compilers may provide an additional bridge mode to enable a simplified
porting for some applications.Note that such support only works for homogeneous data being loaded
into vector registers (that is, no unions or structs containing elements
of different sizes) and when those vectors are loaded from and stored to
memory with element-size-specific built-in vector memory functions of
and
. That is because, in this
mode, data within each element must be adjusted for little-endian data
representation while providing a big-endian layout and numbering of
vector elements within a vector.Because of the internal contradiction of big-endian
vector layouts and little-endian data, such an environment will have
intrinsic limitations for the type of functionality that may be
offered. However, it may provide a useful bridge in the porting of
code using vector built-ins between environments having different
data layout models.Compiler designers may implement additional built-in functions or
other mechanisms that use big-endian element ordering in little-endian
mode. For example, the GCC and IBM XL compilers define the options
-maltivec=be and -qaltivec=be, respectively, to allow programmers to
specify that the built-ins will generate big-endian hardware instructions
directly for the corresponding big-endian sequences in little-endian
mode. To ensure consistent element operation in this mode, the lvx
instructions and related instructions are changed to maintain a
big-endian data layout in registers by adding appropriate permute
sequences as shown in
. The selected vector element
order is reflected in the __VEC_ELEMENT_REG_ORDER__ macro. See
.
Altivec Built-In Vector Memory Access Functions (BE Layout in LE
Mode)Built-In FunctionCorresponding POWER
InstructionsBE Vector Layout in Little-Endian Mode
Implementation Notesvec_ldlvxReverse elements with a vperm after load for LE based on
vector base type.vec_ldelvebx, lvehx, lvewxReverse elements with a vperm after load for LE based on
vector base type.vec_ldllvxlReverse elements with a vperm after load for LE based on
vector base type.vec_ststvxReverse elements with a vperm before store for LE based
on vector base type.vec_stestvebx, stvehx, stvewxReverse elements with a vperm before store for LE based
on vector base type.vec_stlstvxlReverse elements with a vperm before store for LE based
on vector base type.
Access to memory instructions handling potentially unaligned
accesses may be accomplished by using instructions (or instruction
sequences) that perform little-endian load of the underlying vector data
type while maintaining big-endian element ordering. See
.
VSX Built-In Memory Access Functions (BE Layout in LE
Mode)Built-In FunctionCorresponding POWER
InstructionsBE Vector Layout in Little-Endian Mode
Implementation Notesvec_xllxvd2xUse lxvd2x for vector long long; vector long,The vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred. vector double.vec_xlw4Deprecated. The use of vector data type
assignment and overloaded vec_xl and vec_xst vector
built-in functions are preferred forms for assigning
vector operations. Similarly, the use of
__builtin_lxvd2x, __builtin_lxvw4x,
__builtin_stxvd2x, __builtin_stxvw4x,
available in some compilers, is discouraged.lxvw4xUse lxvw4x for vector int; vector float.vec_xld2lxvd2xUse lxvd2x, followed by reversal of elements within each
doubleword, for all other data types.vec_xststxvd2xUse stxvd2x for vector long long; vector long, vector double.vec_xstw4stxvw4xUse stxvw4x for vector int; vector float.vec_xstd2stxvd2xUse stxvd2x, following a reversal of elements within each
doubleword, for all other data types.
The use of -maltivec=be or -qaltivec=be in
little-endian mode disables the transformations described
in
.The operation of the assignment operator is never changed by a
setting such as -qaltivec=be or -maltivec=be.Language-Specific Vector Support for Other LanguagesFortran shows the correspondence
between the C/C++ types described in this document and their Fortran
equivalents. In Fortran, the Boolean vector data types are represented by
VECTOR(UNSIGNED(n)).Because the Fortran language does not support pointers, vector
built-in functions that expect pointers to a base type take an array
element reference to indicate the address of a memory location that is
the subject of a memory access built-in function.Because the Fortran language does not support type casts, the
vec_convert and vec_concat built-in functions shown in
are provided to perform
bit-exact type conversions between vector types.
Built-In Vector Conversion FunctionsGroupDescriptionVEC_CONCAT (ARG1, ARG2)(Fortran)Purpose:Concatenates two elements to form a vector.Result value:The resulting vector consists of the two scalar elements,
ARG1 and ARG2, assigned to elements 0 and 1 (using the
environment’s native endian numbering), respectively.Note: This function corresponds to the C/C++ vector
constructor (vector type){a,b}. It is provided only for
languages without vector constructors.vector signed long long vec_concat (signed long long,
signed long long);vector unsigned long long vec_concat (unsigned long long,
unsigned long long);vector double vec_concat (double, double);VEC_CONVERT(V, MOLD)Purpose:Converts a vector to a vector of a given type.Class:Pure functionArgument type and attributes:V Must be an INTENT(IN) vector.MOLD Must be an INTENT(IN) vector. If it is a
variable, it need not be defined.Result type and attributes:The result is a vector of the same type as MOLD.Result value:The result is as if it were on the left-hand side of an
intrinsic assignment with V on the right-hand side.
gives a correspondence of
Fortran and C/C++ language types.
Fortran Vector Data TypesXL Fortran Vector TypeXL C/C++ Vector TypeVECTOR(INTEGER(1))vector signed charVECTOR(INTEGER(2))vector signed shortVECTOR(INTEGER(4))vector signed intVECTOR(INTEGER(8))vector signed long long, vector signed longThe vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred.VECTOR(INTEGER(16))vector signed __int128VECTOR(UNSIGNED(1))vector unsigned charVECTOR(UNSIGNED(2))vector unsigned shortVECTOR(UNSIGNED(4))vector unsigned intVECTOR(UNSIGNED(8))vector unsigned long long, vector unsigned longVECTOR(UNSIGNED(16))vector unsigned __int128VECTOR(REAL(4))vector floatVECTOR(REAL(8))vector doubleVECTOR(PIXEL)vector pixel
Library Interfacesprintf and scanf of Vector Data TypesSupport for vector variable input and output
may be provided as an extension to the following
POSIX library functions for the new vector conversion format
strings:scanffscanfsscanfwsscanfprintffprintfsprintfsnprintfwsprintfvprintfvfprintfvsprintfvwsprintf(One sample implementation for such an extended specification is
libvecprintf.)The size formatters are as follows:vl or lv consumes one argument and modifies an existing integer
conversion, resulting in vector signed int, vector unsigned int, or
vector bool for output conversions or vector signed int * or vector
unsigned int * for input conversions. The data is then treated as a
series of four 4-byte components, with the subsequent conversion
format applied to each.vh or hv consumes one argument and modifies an existing short
integer conversion, resulting in vector signed short or vector
unsigned short for output conversions or vector signed short * or
vector unsigned short * for input conversions. The data is treated as
a series of eight 2-byte components, with the subsequent conversion
format applied to each.v consumes one argument and modifies a 1-byte integer, 1-byte
character, or 4-byte floating-point conversion. If the conversion is
a floating-point conversion, the result is vector float for output
conversion or vector float * for input conversion. The data is
treated as a series of four 4-byte floating-point components with the
subsequent conversion format applied to each. If the conversion is an
integer or character conversion, the result is either vector signed
char, vector unsigned char, or vector bool char for output
conversion, or vector signed char * or vector unsigned char * for
input conversions. The data is treated as a series of sixteen 1-byte
components, with the subsequent conversion format applied to
each.vv consumes one argument and modifies an 8-byte floating-point
conversion. If the conversion is a floating-point conversion, the
result is vector double for output conversion or vector double * for
input conversion. The data is treated as a series of two 8-byte
floating-point components with the subsequent conversion format
applied to each. Integer and byte conversions are not defined for the
vv modifier.As new vector types are defined, new format codes should
be defined to support scanf and printf of those types.Any conversion format that can be applied to the singular form of a
vector-data type can be used with a vector form. The %d, %x, %X, %u, %i,
and %o integer conversions can be applied with the %lv, %vl, %hv, %vh,
and %v vector-length qualifiers. The %c character conversion can be
applied with the %v vector length qualifier. The %a, %A, %e, %E, %f, %F,
%g, and %G float conversions can be applied with the %v vector length
qualifier.For input conversions, an optional separator character can be
specified excluding white space preceding the separator. If no separator
is specified, the default separator is a space including white space
characters preceding the separator, unless the conversion is c. Then, the
default conversion is null.For output conversions, an optional separator character can be
specified immediately preceding the vector size conversion. If no
separator is specified, the default separator is a space unless the
conversion is c. Then, the default separator is null.