diff --git a/Intrinsics_Reference/ch_biendian.xml b/Intrinsics_Reference/ch_biendian.xml index bbefa87..d313167 100644 --- a/Intrinsics_Reference/ch_biendian.xml +++ b/Intrinsics_Reference/ch_biendian.xml @@ -22,11 +22,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> To ensure portability of applications optimized to exploit the - SIMD functions of POWER ISA processors, the ELF V2 ABI defines a - set of functions and data types for SIMD programming. ELF - V2-compliant compilers will provide suitable support for these - functions, preferably as built-in functions that translate to one - or more POWER ISA instructions. + SIMD functions of POWER ISA processors, this reference defines a + set of functions and data types for SIMD programming. Compliant + compilers will provide suitable support for these functions, + preferably as built-in functions that translate to one or more + POWER ISA instructions. Compilers are encouraged, but not required, to provide built-in @@ -43,27 +43,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> built-in functions are implemented with different instruction sequences for LE and BE. To achieve this, vector built-in functions provide a set of functions derived from the set of - hardware functions provided by the Power vector SIMD - instructions. Unlike traditional “hardware intrinsic” built-in - functions, no fixed mapping exists between these built-in - functions and the generated hardware instruction sequence. Rather, - the compiler is free to generate optimized instruction sequences - that implement the semantics of the program specified by the - programmer using these built-in functions. + hardware functions provided by the POWER SIMD instructions. Unlike + traditional “hardware intrinsic” built-in functions, no fixed + mapping exists between these built-in functions and the generated + hardware instruction sequence. Rather, the compiler is free to + generate optimized instruction sequences that implement the + semantics of the program specified by the programmer using these + built-in functions. - This is primarily applicable to the POWER SIMD instructions. As - we've seen, this set of instructions operates on groups of 2, 4, - 8, or 16 vector elements at a time in 128-bit registers. On a - big-endian POWER platform, vector elements are loaded from memory - into a register so that the 0th element occupies the high-order - bits of the register, and the (N – 1)th element occupies the - low-order bits of the register. This is referred to as big-endian - element order. On a little-endian POWER platform, vector elements - are loaded from memory such that the 0th element occupies the - low-order bits of the register, and the (N – 1)th element - occupies the high-order bits. This is referred to as little-endian - element order. + As we've seen, the POWER SIMD instructions operate on groups of 1, + 2, 4, 8, or 16 vector elements at a time in 128-bit registers. On + a big-endian POWER platform, vector elements are loaded from + memory into a register so that the 0th element occupies the + high-order bits of the register, and the (N – 1)th element + occupies the low-order bits of the register. This is referred to + as big-endian element order. On a little-endian POWER platform, + vector elements are loaded from memory such that the 0th element + occupies the low-order bits of the register, and the (N – + 1)th element occupies the high-order bits. This is referred to as + little-endian element order. @@ -74,6 +73,46 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
+ Language Elements + + The C and C++ languages are extended to use new identifiers + vector, pixel, bool, + __vector, __pixel, and + __bool. These keywords are used to specify vector + data types (). Because + these identifiers may conflict with keywords in more recent C + and C++ language standards, compilers may implement these in one + of two ways. + + + + + __vector, __pixel, + __bool, and bool are defined as + keywords, with vector and pixel as + predefined macros that expand to __vector and + __pixel, respectively. + + + + + __vector, __pixel, and + __bool are defined as keywords in all contexts, + while vector, pixel, and + bool are treated as keywords only within the + context of a type declaration. + + + + + Vector literals may be specified using a type cast and a set of + literal initializers in parentheses or braces. For example, + + vector int x = (vector int) (4, -1, 3, 6); +vector double g = (vector double) { 3.5, -24.6 }; +
+ +
Vector Data Types Languages provide support for the data types in For the C and C++ programming languages (and related/derived languages), these data types may be accessed based on the type names listed in when - Power ISA SIMD language extensions are enabled using either the - vector or __vector keywords. [FIXME: - We haven't talked about these at all. Need to borrow some - description from the AltiVec PIM about the usage of vector, - bool, and pixel, and supplement with the problems this causes - with strict-ANSI C++. Maybe a separate section on "Language - Elements" should precede this one.] + POWER SIMD language extensions are enabled using either the + vector or __vector keywords. For the Fortran language, such as vec_xl and vec_xst are provided for unaligned data access. + + One vector type may be cast to another vector type without + restriction. Such a cast is simply a reinterpretation of the + bits, and does not change the data. + Compilers are expected to recognize and optimize multiple operations that can be optimized into a single hardware @@ -252,6 +291,21 @@ register vector double vd = vec_splats(*double_ptr); 216 – 1. + + + vector pixel + + + 16 + + + Quadword + + + Vector of 8 halfwords, each interpreted as a 1-bit + channel and three 5-bit channels. + + vector unsigned int @@ -424,11 +478,9 @@ register vector double vd = vec_splats(*double_ptr); Vector Operators In addition to the dereference and assignment operators, the - Power SIMD Vector Programming API [FIXME: If we're going to use - a term like this, let's use it consistently; also, SIMD and - Vector are redundant] provides the usual operators that are - valid on pointers; these operators are also valid for pointers - to vector types. + POWER Bi-Endian Vector Programming Model provides the usual + operators that are valid on pointers; these operators are also + valid for pointers to vector types. The traditional C/C++ operators are defined on vector types @@ -580,7 +632,7 @@ register vector double vd = vec_splats(*double_ptr); bits are discarded before performing a memory access. These instructions access load and store data in accordance with the program's current endian mode, and do not need to be adapted - by the compiler to reflect little-endian operating during code + by the compiler to reflect little-endian operation during code generation. @@ -683,7 +735,7 @@ register vector double vd = vec_splats(*double_ptr); Previous versions of the VMX built-in functions defined intrinsics to access the VMX instructions lvsl and lvsr, which could be used in conjunction with - vec_vperm and VMX load and store instructions for + vec_perm and VMX load and store instructions for unaligned access. The vec_lvsl and vec_lvsr interfaces are deprecated in accordance with the interfaces specified here. For compatibility, the @@ -694,12 +746,14 @@ register vector double vd = vec_splats(*double_ptr); discouraged and usually results in worse performance. It is recommended (but not required) that compilers issue a warning when these functions are used in little-endian - environments. It is recommended that programmers use the - vec_xl and vec_xst vector built-in - functions to access unaligned data streams. See the - descriptions of these instructions in for further description and - implementation details. + environments. + + + It is recommended that programmers use the vec_xl + and vec_xst vector built-in functions to access + unaligned data streams. See the descriptions of these + instructions in for further + description and implementation details.
diff --git a/Intrinsics_Reference/ch_intro.xml b/Intrinsics_Reference/ch_intro.xml index 2b8b693..b2bb054 100644 --- a/Intrinsics_Reference/ch_intro.xml +++ b/Intrinsics_Reference/ch_intro.xml @@ -128,12 +128,87 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
The Unified Vector Register Set - filler + + In OpenPOWER-compliant processors, floating-point and vector + operations are implemented using a unified vector-scalar model. + As shown in and , there are 64 vector-scalar registers; each + is 128 bits wide. + + + The vector-scalar registers can be addressed with VSX + instructions, for vector and scalar processing of all 64 + registers, or with the "classic" POWER floating-point + instructions to refer to a 32-register subset of these, having + 64 bits per register. They can also be addressed with VMX + instructions to refer to a 32-register subset of 128-bit registers. + +
+ Floating-Point Registers as Part of VSRs + + + + + +
+
+ Vector Registers as Part of VSRs + + + + + +
Useful Links - filler + + The following documents provide additional reference materials. + + + + + 64-Bit ELF V2 ABI Specification - Power + Architecture. + + https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture + + + + + + + AltiVec Technology Program Interface + Manual. + + https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf + + + + + + + Intel Architecture Instruction Set Extensions and + Future Features Programming Reference. + + https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf + + + + + + + Power Vector Library. + + https://github.com/open-power-sdk/pveclib + + + + +
diff --git a/Intrinsics_Reference/ch_outline.xml b/Intrinsics_Reference/ch_outline.xml deleted file mode 100644 index 429fbf9..0000000 --- a/Intrinsics_Reference/ch_outline.xml +++ /dev/null @@ -1,45 +0,0 @@ - - - - - Notes on what to include - - - - Rewrite the material from ABI Chapter 6 - - - Recommendations for different ways to create efficient vector - code - - - Portable: C,C++; tricks to help compiler vectorize code - - - Use intrinsics - - - Assembly code - not recommended, but if you must - - - - - - - diff --git a/Intrinsics_Reference/ch_techniques.xml b/Intrinsics_Reference/ch_techniques.xml index 1f795f3..3f8f4c1 100644 --- a/Intrinsics_Reference/ch_techniques.xml +++ b/Intrinsics_Reference/ch_techniques.xml @@ -51,6 +51,92 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
Use Assembly Code Sparingly filler +
+ Inline Assembly + filler +
+
+ Assembly Files + filler +
+
+ +
+ Other Vector Programming APIs + In addition to the intrinsic functions provided in this + reference, programmers should be aware of other vector programming + API resources. +
+ x86 Vector Portability Headers + + Recent versions of the gcc and clang + open source compilers provide "drop-in" portability headers + for portions of the Intel Architecture Instruction Set + Extensions (see ). These + headers mirror the APIs of Intel headers having the same + names. Support is provided for the MMX and SSE layers, up + through SSE4. At this time, no support for the AVX layers is + envisioned. + + + The portability headers provide the same semantics as the + corresponding Intel APIs, but using VMX and VSX instructions + to emulate the Intel vector instructions. It should be + emphasized that these headers are provided for portability, + and will not necessarily perform optimally (although in many + cases the performance is very good). Using these headers is + often a good first step in porting a library using Intel + intrinsics to POWER, after which more detailed rewriting of + algorithms is usually desirable for best performance. + + + Access to the portability APIs occurs automatically when + including one of the corresponding Intel header files, such as + <mmintrin.h>. + +
+
+ The POWER Vector Library (pveclib) + The POWER Vector Library, also known as + pveclib, is a separate project available from + github (see ). The + pveclib project builds on top of the intrinsics + described in this manual to provide higher-level vector + interfaces that are highly portable. The goals of the project + include: + + + + + Providing equivalent functions across versions of the + PowerISA. For example, the Vector + Multiply-by-10 Unsigned Quadword operation + introduced in PowerISA 3.0 (POWER9) can be implemented + using a few vector instructions on earlier PowerISA + versions. + + + + + Providing equivalent functions across compiler versions. + For example, intrinsics provided in later versions of the + compiler can be implemented as inline functions with + inline asm in earlier compiler versions. + + + + + Providing higher-order functions not provided directly by + the PowerISA. One example is a vector SIMD implementation + for ASCII __isalpha and similar functions. + Another example is full __int128 + implementations of Count Leading + Zeroes, Population Count, + and Multiply. + + + +
diff --git a/Intrinsics_Reference/fig-fpr-vsr.png b/Intrinsics_Reference/fig-fpr-vsr.png new file mode 100644 index 0000000..abf0689 Binary files /dev/null and b/Intrinsics_Reference/fig-fpr-vsr.png differ diff --git a/Intrinsics_Reference/fig-vr-vsr.png b/Intrinsics_Reference/fig-vr-vsr.png new file mode 100644 index 0000000..0368431 Binary files /dev/null and b/Intrinsics_Reference/fig-vr-vsr.png differ