The POWER Bi-Endian Vector Programming Model
+ The Power Bi-Endian Vector Programming Model
To ensure portability of applications optimized to exploit the
- SIMD functions of POWER ISA processors, this reference defines a
+ SIMD functions of Power ISA processors, this reference defines a
set of functions and data types for SIMD programming. Compliant
compilers will provide suitable support for these functions,
preferably as built-in functions that translate to one or more
- POWER ISA instructions.
+ Power ISA instructions.
Compilers are encouraged, but not required, to provide built-in
- functions to access individual instructions in the IBM POWER®
+ functions to access individual instructions in the IBM Power®
instruction set architecture. In most cases, each such built-in
function should provide direct access to the underlying
instruction.
However, to ease porting between little-endian (LE) and big-endian
- (BE) POWER systems, and between POWER and other platforms, it is
+ (BE) Power systems, and between Power and other platforms, it is
preferable that some built-in functions provide the same semantics
- on both LE and BE POWER systems, even if this means that the
+ on both LE and BE Power systems, even if this means that the
built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of
- hardware functions provided by the POWER SIMD instructions. Unlike
+ hardware functions provided by the Power SIMD instructions. Unlike
traditional “hardware intrinsic” built-in functions, no fixed
mapping exists between these built-in functions and the generated
hardware instruction sequence. Rather, the compiler is free to
@@ -52,13 +52,13 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
built-in functions.
- As we've seen, the POWER SIMD instructions operate on groups of 1,
+ As we've seen, the Power SIMD instructions operate on groups of 1,
2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
- a big-endian POWER platform, vector elements are loaded from
+ a big-endian Power platform, vector elements are loaded from
memory into a register so that the 0th element occupies the
high-order bits of the register, and the (N – 1)th element
occupies the low-order bits of the register. This is referred to
- as big-endian element order. On a little-endian POWER platform,
+ as big-endian element order. On a little-endian Power platform,
vector elements are loaded from memory such that the 0th element
occupies the low-order bits of the register, and the (N –
1)th element occupies the high-order bits. This is referred to as
@@ -68,7 +68,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
Much of the information in this chapter was formerly part of
- Chapter 6 of the 64-Bit ELF V2 ABI Specification for POWER.
+ Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power.
@@ -123,7 +123,7 @@ vector double g = (vector double) { 3.5, -24.6 };
For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type
names listed in when
- POWER SIMD language extensions are enabled using either the
+ Power SIMD language extensions are enabled using either the
vector or __vector keywords.
@@ -478,7 +478,7 @@ register vector double vd = vec_splats(*double_ptr);
Vector Operators
In addition to the dereference and assignment operators, the
- POWER Bi-Endian Vector Programming Model provides the usual
+ Power Bi-Endian Vector Programming Model provides the usual
operators that are valid on pointers; these operators are also
valid for pointers to vector types.
@@ -589,7 +589,7 @@ register vector double vd = vec_splats(*double_ptr);
Vector Built-In Functions
- Some of the POWER SIMD hardware instructions refer, implicitly
+ Some of the Power SIMD hardware instructions refer, implicitly
or explicitly, to vector element numbers. For example, the
vspltb instruction has as one of its inputs an
index into a vector. The element at that index position is to
@@ -650,7 +650,7 @@ register vector double vd = vec_splats(*double_ptr);
- Corresponding POWER
+ Corresponding Power
Instructions
@@ -761,7 +761,7 @@ register vector double vd = vec_splats(*double_ptr);
(Deprecated)
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
- for POWER provided for optional compiler support for using
+ for Power provided for optional compiler support for using
big-endian element ordering in little-endian environments.
This was initially deemed useful for porting certain libraries
that assumed big-endian element ordering regardless of the
diff --git a/Intrinsics_Reference/ch_intro.xml b/Intrinsics_Reference/ch_intro.xml
index 49a1946..ca9052a 100644
--- a/Intrinsics_Reference/ch_intro.xml
+++ b/Intrinsics_Reference/ch_intro.xml
@@ -18,12 +18,12 @@
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
- Introduction to Vector Programming on POWER
+ Introduction to Vector Programming on PowerA Brief History
- The history of vector programming on POWER processors begins
+ The history of vector programming on Power processors begins
with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The
AIM partners developed the Power Vector Media Extension (VMX) to
accelerate multimedia applications, particularly image
@@ -87,15 +87,15 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
a VSR can now contain a single 128-bit integer; and starting
with POWER9, a VSR can contain a single 128-bit floating-point
value. The VMX and VSX instruction sets together may be
- referred to as the POWER SIMD (single-instruction,
+ referred to as the Power SIMD (single-instruction,
multiple-data) instructions.
Little-Endian Linux
- The POWER architecture has supported operation in either
+ The Power architecture has supported operation in either
big-endian (BE) or little-endian (LE) mode from the
- beginning. However, IBM's POWER servers were only shipped
+ beginning. However, IBM's Power servers were only shipped
with big-endian operating systems (AIX, Linux, i5/OS) prior to
the introduction of POWER8. With POWER8, IBM began
supporting little-endian Linux distributions for the first
@@ -106,7 +106,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
currently used only for little-endian Linux.
- Although POWER has always supported big- and little-endian
+ Although Power has always supported big- and little-endian
memory accesses, the introduction of vector register support
added a layer of complexity to programming for processors
operating in different endian modes. Arrays of elements
@@ -137,7 +137,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
The vector-scalar registers can be addressed with VSX
instructions, for vector and scalar processing of all 64
- registers, or with the "classic" POWER floating-point
+ registers, or with the "classic" Power floating-point
instructions to refer to a 32-register subset of these, having
64 bits per register. They can also be addressed with VMX
instructions to refer to a 32-register subset of 128-bit registers.
@@ -198,6 +198,16 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
+
+
+ Power Instruction Set Architecture,
+ Version 3.0B Specification.
+
+ https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
+
+
+
+ Power Vector Library.
diff --git a/Intrinsics_Reference/ch_techniques.xml b/Intrinsics_Reference/ch_techniques.xml
index 892c5f9..5cab64e 100644
--- a/Intrinsics_Reference/ch_techniques.xml
+++ b/Intrinsics_Reference/ch_techniques.xml
@@ -31,11 +31,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
intrinsics the best way to ensure that the compiler does exactly
what you want? Well, sometimes. But the problem is that the
best instruction sequence today may not be the best instruction
- sequence tomorrow. As the PowerISA moves forward, new
+ sequence tomorrow. As the Power ISA moves forward, new
instruction capabilities appear, and the old code you wrote can
easily become obsolete. Then you start having to create
different versions of the code for different levels of the
- PowerISA, and it can quickly become difficult to maintain.
+ Power ISA, and it can quickly become difficult to maintain.
Most often programmers use vector intrinsics to increase the
@@ -141,7 +141,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
This reference provides intrinsics that are guaranteed to be
portable across compliant compilers. In particular, both the
- GCC and Clang compilers for POWER implement the intrinsics in
+ GCC and Clang compilers for Power implement the intrinsics in
this manual. The compilers may each implement many more
intrinsics, but the ones in this manual are the only ones
guaranteed to be portable. So if you are using an interface not
@@ -151,7 +151,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
There are also other vector APIs that may be of use to you (see
). In particular, the
- POWER Vector Library (see ) provides additional
portability across compiler versions.
@@ -221,7 +221,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
and will not necessarily perform optimally (although in many
cases the performance is very good). Using these headers is
often a good first step in porting a library using Intel
- intrinsics to POWER, after which more detailed rewriting of
+ intrinsics to Power, after which more detailed rewriting of
algorithms is usually desirable for best performance.
@@ -231,8 +231,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
- The POWER Vector Library (pveclib)
- The POWER Vector Library, also known as
+ The Power Vector Library (pveclib)
+ The Power Vector Library, also known as
pveclib, is a separate project available from
github (see ). The
pveclib project builds on top of the intrinsics
@@ -244,10 +244,10 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
Providing equivalent functions across versions of the
- PowerISA. For example, the Vector
+ Power ISA. For example, the Vector
Multiply-by-10 Unsigned Quadword operation
- introduced in PowerISA 3.0 (POWER9) can be implemented
- using a few vector instructions on earlier PowerISA
+ introduced in Power ISA 3.0 (POWER9) can be implemented
+ using a few vector instructions on earlier Power ISA
versions.
@@ -262,7 +262,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
Providing higher-order functions not provided directly by
- the PowerISA. One example is a vector SIMD implementation
+ the Power ISA. One example is a vector SIMD implementation
for ASCII __isalpha and similar functions.
Another example is full __int128
implementations of Count Leading
diff --git a/Intrinsics_Reference/ch_vec_reference.xml b/Intrinsics_Reference/ch_vec_reference.xml
index d142c0e..466fba7 100644
--- a/Intrinsics_Reference/ch_vec_reference.xml
+++ b/Intrinsics_Reference/ch_vec_reference.xml
@@ -15594,15 +15594,28 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref">
r is set to the value of the
ith bit of the jth byte
element of a.
+
+ , taken from the
+ Power ISA, shows how bits are combined by the
+ vec_gb intrinsic. Here VR[VRT] is
+ equivalent to r, and
+ VR[VRB] is equivalent to a.
+
+
Endian considerations:
The vec_gb intrinsic function assumes
big-endian (left-to-right) numbering for both bits and bytes, matching
the ISA 2.07 vgbbd instruction.
- Notes:
- Try to get the diagram from the ISA manual to include
- here.
- vgbbd
diff --git a/Intrinsics_Reference/vgbbd.png b/Intrinsics_Reference/vgbbd.png
new file mode 100644
index 0000000..053ef58
Binary files /dev/null and b/Intrinsics_Reference/vgbbd.png differ