diff --git a/Intrinsics_Reference/ch_biendian.xml b/Intrinsics_Reference/ch_biendian.xml
index bbefa87..d313167 100644
--- a/Intrinsics_Reference/ch_biendian.xml
+++ b/Intrinsics_Reference/ch_biendian.xml
@@ -22,11 +22,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
To ensure portability of applications optimized to exploit the
- SIMD functions of POWER ISA processors, the ELF V2 ABI defines a
- set of functions and data types for SIMD programming. ELF
- V2-compliant compilers will provide suitable support for these
- functions, preferably as built-in functions that translate to one
- or more POWER ISA instructions.
+ SIMD functions of POWER ISA processors, this reference defines a
+ set of functions and data types for SIMD programming. Compliant
+ compilers will provide suitable support for these functions,
+ preferably as built-in functions that translate to one or more
+ POWER ISA instructions.
Compilers are encouraged, but not required, to provide built-in
@@ -43,27 +43,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of
- hardware functions provided by the Power vector SIMD
- instructions. Unlike traditional “hardware intrinsic” built-in
- functions, no fixed mapping exists between these built-in
- functions and the generated hardware instruction sequence. Rather,
- the compiler is free to generate optimized instruction sequences
- that implement the semantics of the program specified by the
- programmer using these built-in functions.
+ hardware functions provided by the POWER SIMD instructions. Unlike
+ traditional “hardware intrinsic” built-in functions, no fixed
+ mapping exists between these built-in functions and the generated
+ hardware instruction sequence. Rather, the compiler is free to
+ generate optimized instruction sequences that implement the
+ semantics of the program specified by the programmer using these
+ built-in functions.
- This is primarily applicable to the POWER SIMD instructions. As
- we've seen, this set of instructions operates on groups of 2, 4,
- 8, or 16 vector elements at a time in 128-bit registers. On a
- big-endian POWER platform, vector elements are loaded from memory
- into a register so that the 0th element occupies the high-order
- bits of the register, and the (N – 1)th element occupies the
- low-order bits of the register. This is referred to as big-endian
- element order. On a little-endian POWER platform, vector elements
- are loaded from memory such that the 0th element occupies the
- low-order bits of the register, and the (N – 1)th element
- occupies the high-order bits. This is referred to as little-endian
- element order.
+ As we've seen, the POWER SIMD instructions operate on groups of 1,
+ 2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
+ a big-endian POWER platform, vector elements are loaded from
+ memory into a register so that the 0th element occupies the
+ high-order bits of the register, and the (N – 1)th element
+ occupies the low-order bits of the register. This is referred to
+ as big-endian element order. On a little-endian POWER platform,
+ vector elements are loaded from memory such that the 0th element
+ occupies the low-order bits of the register, and the (N –
+ 1)th element occupies the high-order bits. This is referred to as
+ little-endian element order.
@@ -74,6 +73,46 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
+ Language Elements
+
+ The C and C++ languages are extended to use new identifiers
+ vector, pixel, bool,
+ __vector, __pixel, and
+ __bool. These keywords are used to specify vector
+ data types (). Because
+ these identifiers may conflict with keywords in more recent C
+ and C++ language standards, compilers may implement these in one
+ of two ways.
+
+
+
+
+ __vector, __pixel,
+ __bool, and bool are defined as
+ keywords, with vector and pixel as
+ predefined macros that expand to __vector and
+ __pixel, respectively.
+
+
+
+
+ __vector, __pixel, and
+ __bool are defined as keywords in all contexts,
+ while vector, pixel, and
+ bool are treated as keywords only within the
+ context of a type declaration.
+
+
+
+
+ Vector literals may be specified using a type cast and a set of
+ literal initializers in parentheses or braces. For example,
+
+ vector int x = (vector int) (4, -1, 3, 6);
+vector double g = (vector double) { 3.5, -24.6 };
+
+
+ Vector Data Types
Languages provide support for the data types in
For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type
names listed in when
- Power ISA SIMD language extensions are enabled using either the
- vector or __vector keywords. [FIXME:
- We haven't talked about these at all. Need to borrow some
- description from the AltiVec PIM about the usage of vector,
- bool, and pixel, and supplement with the problems this causes
- with strict-ANSI C++. Maybe a separate section on "Language
- Elements" should precede this one.]
+ POWER SIMD language extensions are enabled using either the
+ vector or __vector keywords.
For the Fortran language,
such as vec_xl and vec_xst are
provided for unaligned data access.
+
+ One vector type may be cast to another vector type without
+ restriction. Such a cast is simply a reinterpretation of the
+ bits, and does not change the data.
+
Compilers are expected to recognize and optimize multiple
operations that can be optimized into a single hardware
@@ -252,6 +291,21 @@ register vector double vd = vec_splats(*double_ptr);
216 – 1.
+
+
+ vector pixel
+
+
+ 16
+
+
+ Quadword
+
+
+ Vector of 8 halfwords, each interpreted as a 1-bit
+ channel and three 5-bit channels.
+
+ vector unsigned int
@@ -424,11 +478,9 @@ register vector double vd = vec_splats(*double_ptr);
Vector Operators
In addition to the dereference and assignment operators, the
- Power SIMD Vector Programming API [FIXME: If we're going to use
- a term like this, let's use it consistently; also, SIMD and
- Vector are redundant] provides the usual operators that are
- valid on pointers; these operators are also valid for pointers
- to vector types.
+ POWER Bi-Endian Vector Programming Model provides the usual
+ operators that are valid on pointers; these operators are also
+ valid for pointers to vector types.
The traditional C/C++ operators are defined on vector types
@@ -580,7 +632,7 @@ register vector double vd = vec_splats(*double_ptr);
bits are discarded before performing a memory access. These
instructions access load and store data in accordance with the
program's current endian mode, and do not need to be adapted
- by the compiler to reflect little-endian operating during code
+ by the compiler to reflect little-endian operation during code
generation.
@@ -683,7 +735,7 @@ register vector double vd = vec_splats(*double_ptr);
Previous versions of the VMX built-in functions defined
intrinsics to access the VMX instructions lvsl
and lvsr, which could be used in conjunction with
- vec_vperm and VMX load and store instructions for
+ vec_perm and VMX load and store instructions for
unaligned access. The vec_lvsl and
vec_lvsr interfaces are deprecated in accordance
with the interfaces specified here. For compatibility, the
@@ -694,12 +746,14 @@ register vector double vd = vec_splats(*double_ptr);
discouraged and usually results in worse performance. It is
recommended (but not required) that compilers issue a warning
when these functions are used in little-endian
- environments. It is recommended that programmers use the
- vec_xl and vec_xst vector built-in
- functions to access unaligned data streams. See the
- descriptions of these instructions in for further description and
- implementation details.
+ environments.
+
+
+ It is recommended that programmers use the vec_xl
+ and vec_xst vector built-in functions to access
+ unaligned data streams. See the descriptions of these
+ instructions in for further
+ description and implementation details.
diff --git a/Intrinsics_Reference/ch_intro.xml b/Intrinsics_Reference/ch_intro.xml
index 2b8b693..b2bb054 100644
--- a/Intrinsics_Reference/ch_intro.xml
+++ b/Intrinsics_Reference/ch_intro.xml
@@ -128,12 +128,87 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
The Unified Vector Register Set
- filler
+
+ In OpenPOWER-compliant processors, floating-point and vector
+ operations are implemented using a unified vector-scalar model.
+ As shown in and , there are 64 vector-scalar registers; each
+ is 128 bits wide.
+
+
+ The vector-scalar registers can be addressed with VSX
+ instructions, for vector and scalar processing of all 64
+ registers, or with the "classic" POWER floating-point
+ instructions to refer to a 32-register subset of these, having
+ 64 bits per register. They can also be addressed with VMX
+ instructions to refer to a 32-register subset of 128-bit registers.
+
+
+
Useful Links
- filler
+
+ The following documents provide additional reference materials.
+
+
+
+
+ 64-Bit ELF V2 ABI Specification - Power
+ Architecture.
+
+ https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
+
+
+
+
+
+
+ AltiVec Technology Program Interface
+ Manual.
+
+ https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
+
+
+
+
+
+
+ Intel Architecture Instruction Set Extensions and
+ Future Features Programming Reference.
+
+ https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+
+
+
+
+
+
+ Power Vector Library.
+
+ https://github.com/open-power-sdk/pveclib
+
+
+
+
+
diff --git a/Intrinsics_Reference/ch_outline.xml b/Intrinsics_Reference/ch_outline.xml
deleted file mode 100644
index 429fbf9..0000000
--- a/Intrinsics_Reference/ch_outline.xml
+++ /dev/null
@@ -1,45 +0,0 @@
-
-
-
-
- Notes on what to include
-
-
-
- Rewrite the material from ABI Chapter 6
-
-
- Recommendations for different ways to create efficient vector
- code
-
-
- Portable: C,C++; tricks to help compiler vectorize code
-
-
- Use intrinsics
-
-
- Assembly code - not recommended, but if you must
-
-
-
-
-
-
-
diff --git a/Intrinsics_Reference/ch_techniques.xml b/Intrinsics_Reference/ch_techniques.xml
index 1f795f3..3f8f4c1 100644
--- a/Intrinsics_Reference/ch_techniques.xml
+++ b/Intrinsics_Reference/ch_techniques.xml
@@ -51,6 +51,92 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
Use Assembly Code Sparinglyfiller
+
+ Inline Assembly
+ filler
+
+
+ Assembly Files
+ filler
+
+
+
+
+ Other Vector Programming APIs
+ In addition to the intrinsic functions provided in this
+ reference, programmers should be aware of other vector programming
+ API resources.
+
+ x86 Vector Portability Headers
+
+ Recent versions of the gcc and clang
+ open source compilers provide "drop-in" portability headers
+ for portions of the Intel Architecture Instruction Set
+ Extensions (see ). These
+ headers mirror the APIs of Intel headers having the same
+ names. Support is provided for the MMX and SSE layers, up
+ through SSE4. At this time, no support for the AVX layers is
+ envisioned.
+
+
+ The portability headers provide the same semantics as the
+ corresponding Intel APIs, but using VMX and VSX instructions
+ to emulate the Intel vector instructions. It should be
+ emphasized that these headers are provided for portability,
+ and will not necessarily perform optimally (although in many
+ cases the performance is very good). Using these headers is
+ often a good first step in porting a library using Intel
+ intrinsics to POWER, after which more detailed rewriting of
+ algorithms is usually desirable for best performance.
+
+
+ Access to the portability APIs occurs automatically when
+ including one of the corresponding Intel header files, such as
+ <mmintrin.h>.
+
+
+
+ The POWER Vector Library (pveclib)
+ The POWER Vector Library, also known as
+ pveclib, is a separate project available from
+ github (see ). The
+ pveclib project builds on top of the intrinsics
+ described in this manual to provide higher-level vector
+ interfaces that are highly portable. The goals of the project
+ include:
+
+
+
+
+ Providing equivalent functions across versions of the
+ PowerISA. For example, the Vector
+ Multiply-by-10 Unsigned Quadword operation
+ introduced in PowerISA 3.0 (POWER9) can be implemented
+ using a few vector instructions on earlier PowerISA
+ versions.
+
+
+
+
+ Providing equivalent functions across compiler versions.
+ For example, intrinsics provided in later versions of the
+ compiler can be implemented as inline functions with
+ inline asm in earlier compiler versions.
+
+
+
+
+ Providing higher-order functions not provided directly by
+ the PowerISA. One example is a vector SIMD implementation
+ for ASCII __isalpha and similar functions.
+ Another example is full __int128
+ implementations of Count Leading
+ Zeroes, Population Count,
+ and Multiply.
+
+
+
+
diff --git a/Intrinsics_Reference/fig-fpr-vsr.png b/Intrinsics_Reference/fig-fpr-vsr.png
new file mode 100644
index 0000000..abf0689
Binary files /dev/null and b/Intrinsics_Reference/fig-fpr-vsr.png differ
diff --git a/Intrinsics_Reference/fig-vr-vsr.png b/Intrinsics_Reference/fig-vr-vsr.png
new file mode 100644
index 0000000..0368431
Binary files /dev/null and b/Intrinsics_Reference/fig-vr-vsr.png differ