Vector Programming Techniques
Help the Compiler Help You Start with scalar code, which is the most portable. Use various tricks for helping the compiler vectorize scalar code. Make sure you align your data on 16-byte boundaries wherever possible, and tell the compiler it's aligned. Use __restrict__ pointers to promise data does not alias.
Use Portable Intrinsics Individual compilers may provide other intrinsic support. Only the intrinsics in this manual are guaranteed to be portable across compliant compilers. Some compilers may provide compatibility headers for use with other architectures. Recent GCC and Clang compilers support compatibility headers for the lower levels of the x86 vector architecture. These can be used initially for ease of porting, but for best performance, it is preferable to rewrite important sections of code with native Power intrinsics.
Use Assembly Code Sparingly filler
Inline Assembly filler
Assembly Files filler
Other Vector Programming APIs In addition to the intrinsic functions provided in this reference, programmers should be aware of other vector programming API resources.
x86 Vector Portability Headers Recent versions of the gcc and clang open source compilers provide "drop-in" portability headers for portions of the Intel Architecture Instruction Set Extensions (see ). These headers mirror the APIs of Intel headers having the same names. Support is provided for the MMX and SSE layers, up through SSE4. At this time, no support for the AVX layers is envisioned. The portability headers provide the same semantics as the corresponding Intel APIs, but using VMX and VSX instructions to emulate the Intel vector instructions. It should be emphasized that these headers are provided for portability, and will not necessarily perform optimally (although in many cases the performance is very good). Using these headers is often a good first step in porting a library using Intel intrinsics to POWER, after which more detailed rewriting of algorithms is usually desirable for best performance. Access to the portability APIs occurs automatically when including one of the corresponding Intel header files, such as <mmintrin.h>.
The POWER Vector Library (pveclib) The POWER Vector Library, also known as pveclib, is a separate project available from github (see ). The pveclib project builds on top of the intrinsics described in this manual to provide higher-level vector interfaces that are highly portable. The goals of the project include: Providing equivalent functions across versions of the PowerISA. For example, the Vector Multiply-by-10 Unsigned Quadword operation introduced in PowerISA 3.0 (POWER9) can be implemented using a few vector instructions on earlier PowerISA versions. Providing equivalent functions across compiler versions. For example, intrinsics provided in later versions of the compiler can be implemented as inline functions with inline asm in earlier compiler versions. Providing higher-order functions not provided directly by the PowerISA. One example is a vector SIMD implementation for ASCII __isalpha and similar functions. Another example is full __int128 implementations of Count Leading Zeroes, Population Count, and Multiply.