diff --git a/Intrinsics_Reference/ch_biendian.xml b/Intrinsics_Reference/ch_biendian.xml index 5846956..6b27880 100644 --- a/Intrinsics_Reference/ch_biendian.xml +++ b/Intrinsics_Reference/ch_biendian.xml @@ -18,32 +18,32 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> - The POWER Bi-Endian Vector Programming Model + The Power Bi-Endian Vector Programming Model To ensure portability of applications optimized to exploit the - SIMD functions of POWER ISA processors, this reference defines a + SIMD functions of Power ISA processors, this reference defines a set of functions and data types for SIMD programming. Compliant compilers will provide suitable support for these functions, preferably as built-in functions that translate to one or more - POWER ISA instructions. + Power ISA instructions. Compilers are encouraged, but not required, to provide built-in - functions to access individual instructions in the IBM POWER® + functions to access individual instructions in the IBM Power® instruction set architecture. In most cases, each such built-in function should provide direct access to the underlying instruction. However, to ease porting between little-endian (LE) and big-endian - (BE) POWER systems, and between POWER and other platforms, it is + (BE) Power systems, and between Power and other platforms, it is preferable that some built-in functions provide the same semantics - on both LE and BE POWER systems, even if this means that the + on both LE and BE Power systems, even if this means that the built-in functions are implemented with different instruction sequences for LE and BE. To achieve this, vector built-in functions provide a set of functions derived from the set of - hardware functions provided by the POWER SIMD instructions. Unlike + hardware functions provided by the Power SIMD instructions. Unlike traditional “hardware intrinsic” built-in functions, no fixed mapping exists between these built-in functions and the generated hardware instruction sequence. Rather, the compiler is free to @@ -52,13 +52,13 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> built-in functions. - As we've seen, the POWER SIMD instructions operate on groups of 1, + As we've seen, the Power SIMD instructions operate on groups of 1, 2, 4, 8, or 16 vector elements at a time in 128-bit registers. On - a big-endian POWER platform, vector elements are loaded from + a big-endian Power platform, vector elements are loaded from memory into a register so that the 0th element occupies the high-order bits of the register, and the (N – 1)th element occupies the low-order bits of the register. This is referred to - as big-endian element order. On a little-endian POWER platform, + as big-endian element order. On a little-endian Power platform, vector elements are loaded from memory such that the 0th element occupies the low-order bits of the register, and the (N – 1)th element occupies the high-order bits. This is referred to as @@ -68,7 +68,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> Much of the information in this chapter was formerly part of - Chapter 6 of the 64-Bit ELF V2 ABI Specification for POWER. + Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power. @@ -123,7 +123,7 @@ vector double g = (vector double) { 3.5, -24.6 }; For the C and C++ programming languages (and related/derived languages), these data types may be accessed based on the type names listed in when - POWER SIMD language extensions are enabled using either the + Power SIMD language extensions are enabled using either the vector or __vector keywords. @@ -478,7 +478,7 @@ register vector double vd = vec_splats(*double_ptr); Vector Operators In addition to the dereference and assignment operators, the - POWER Bi-Endian Vector Programming Model provides the usual + Power Bi-Endian Vector Programming Model provides the usual operators that are valid on pointers; these operators are also valid for pointers to vector types. @@ -589,7 +589,7 @@ register vector double vd = vec_splats(*double_ptr);
Vector Built-In Functions - Some of the POWER SIMD hardware instructions refer, implicitly + Some of the Power SIMD hardware instructions refer, implicitly or explicitly, to vector element numbers. For example, the vspltb instruction has as one of its inputs an index into a vector. The element at that index position is to @@ -650,7 +650,7 @@ register vector double vd = vec_splats(*double_ptr); - Corresponding POWER + Corresponding Power Instructions @@ -761,7 +761,7 @@ register vector double vd = vec_splats(*double_ptr); (Deprecated) Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification - for POWER provided for optional compiler support for using + for Power provided for optional compiler support for using big-endian element ordering in little-endian environments. This was initially deemed useful for porting certain libraries that assumed big-endian element ordering regardless of the diff --git a/Intrinsics_Reference/ch_intro.xml b/Intrinsics_Reference/ch_intro.xml index 49a1946..ca9052a 100644 --- a/Intrinsics_Reference/ch_intro.xml +++ b/Intrinsics_Reference/ch_intro.xml @@ -18,12 +18,12 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro"> - Introduction to Vector Programming on POWER + Introduction to Vector Programming on Power
A Brief History - The history of vector programming on POWER processors begins + The history of vector programming on Power processors begins with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The AIM partners developed the Power Vector Media Extension (VMX) to accelerate multimedia applications, particularly image @@ -87,15 +87,15 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro"> a VSR can now contain a single 128-bit integer; and starting with POWER9, a VSR can contain a single 128-bit floating-point value. The VMX and VSX instruction sets together may be - referred to as the POWER SIMD (single-instruction, + referred to as the Power SIMD (single-instruction, multiple-data) instructions.
Little-Endian Linux - The POWER architecture has supported operation in either + The Power architecture has supported operation in either big-endian (BE) or little-endian (LE) mode from the - beginning. However, IBM's POWER servers were only shipped + beginning. However, IBM's Power servers were only shipped with big-endian operating systems (AIX, Linux, i5/OS) prior to the introduction of POWER8. With POWER8, IBM began supporting little-endian Linux distributions for the first @@ -106,7 +106,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro"> currently used only for little-endian Linux. - Although POWER has always supported big- and little-endian + Although Power has always supported big- and little-endian memory accesses, the introduction of vector register support added a layer of complexity to programming for processors operating in different endian modes. Arrays of elements @@ -137,7 +137,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro"> The vector-scalar registers can be addressed with VSX instructions, for vector and scalar processing of all 64 - registers, or with the "classic" POWER floating-point + registers, or with the "classic" Power floating-point instructions to refer to a 32-register subset of these, having 64 bits per register. They can also be addressed with VMX instructions to refer to a 32-register subset of 128-bit registers. @@ -198,6 +198,16 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro"> + + + Power Instruction Set Architecture, + Version 3.0B Specification. + + https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 + + + + Power Vector Library. diff --git a/Intrinsics_Reference/ch_techniques.xml b/Intrinsics_Reference/ch_techniques.xml index 892c5f9..5cab64e 100644 --- a/Intrinsics_Reference/ch_techniques.xml +++ b/Intrinsics_Reference/ch_techniques.xml @@ -31,11 +31,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques"> intrinsics the best way to ensure that the compiler does exactly what you want? Well, sometimes. But the problem is that the best instruction sequence today may not be the best instruction - sequence tomorrow. As the PowerISA moves forward, new + sequence tomorrow. As the Power ISA moves forward, new instruction capabilities appear, and the old code you wrote can easily become obsolete. Then you start having to create different versions of the code for different levels of the - PowerISA, and it can quickly become difficult to maintain. + Power ISA, and it can quickly become difficult to maintain. Most often programmers use vector intrinsics to increase the @@ -141,7 +141,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques"> This reference provides intrinsics that are guaranteed to be portable across compliant compilers. In particular, both the - GCC and Clang compilers for POWER implement the intrinsics in + GCC and Clang compilers for Power implement the intrinsics in this manual. The compilers may each implement many more intrinsics, but the ones in this manual are the only ones guaranteed to be portable. So if you are using an interface not @@ -151,7 +151,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques"> There are also other vector APIs that may be of use to you (see ). In particular, the - POWER Vector Library (see ) provides additional portability across compiler versions. @@ -221,7 +221,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques"> and will not necessarily perform optimally (although in many cases the performance is very good). Using these headers is often a good first step in porting a library using Intel - intrinsics to POWER, after which more detailed rewriting of + intrinsics to Power, after which more detailed rewriting of algorithms is usually desirable for best performance. @@ -231,8 +231,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
- The POWER Vector Library (pveclib) - The POWER Vector Library, also known as + The Power Vector Library (pveclib) + The Power Vector Library, also known as pveclib, is a separate project available from github (see ). The pveclib project builds on top of the intrinsics @@ -244,10 +244,10 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques"> Providing equivalent functions across versions of the - PowerISA. For example, the Vector + Power ISA. For example, the Vector Multiply-by-10 Unsigned Quadword operation - introduced in PowerISA 3.0 (POWER9) can be implemented - using a few vector instructions on earlier PowerISA + introduced in Power ISA 3.0 (POWER9) can be implemented + using a few vector instructions on earlier Power ISA versions. @@ -262,7 +262,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques"> Providing higher-order functions not provided directly by - the PowerISA. One example is a vector SIMD implementation + the Power ISA. One example is a vector SIMD implementation for ASCII __isalpha and similar functions. Another example is full __int128 implementations of Count Leading diff --git a/Intrinsics_Reference/ch_vec_reference.xml b/Intrinsics_Reference/ch_vec_reference.xml index d142c0e..466fba7 100644 --- a/Intrinsics_Reference/ch_vec_reference.xml +++ b/Intrinsics_Reference/ch_vec_reference.xml @@ -15594,15 +15594,28 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> r is set to the value of the ith bit of the jth byte element of a. + + , taken from the + Power ISA, shows how bits are combined by the + vec_gb intrinsic. Here VR[VRT] is + equivalent to r, and + VR[VRB] is equivalent to a. + +
+ Operation of vec_gb + + + + + +
Endian considerations: The vec_gb intrinsic function assumes big-endian (left-to-right) numbering for both bits and bytes, matching the ISA 2.07 vgbbd instruction. - Notes: - Try to get the diagram from the ISA manual to include - here. - vgbbd diff --git a/Intrinsics_Reference/vgbbd.png b/Intrinsics_Reference/vgbbd.png new file mode 100644 index 0000000..053ef58 Binary files /dev/null and b/Intrinsics_Reference/vgbbd.png differ