Changed to consistently use Power versus POWER, Power ISA versus

PowerISA, etc.  Added graphic to vec_gb.
pull/30/head
Bill Schmidt 5 years ago
parent 4b974079d2
commit 029c89866f

@ -18,32 +18,32 @@
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian"> xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
<!-- Chapter Title goes here. --> <!-- Chapter Title goes here. -->
<title>The POWER Bi-Endian Vector Programming Model</title> <title>The Power Bi-Endian Vector Programming Model</title>


<para> <para>
To ensure portability of applications optimized to exploit the To ensure portability of applications optimized to exploit the
SIMD functions of POWER ISA processors, this reference defines a SIMD functions of Power ISA processors, this reference defines a
set of functions and data types for SIMD programming. Compliant set of functions and data types for SIMD programming. Compliant
compilers will provide suitable support for these functions, compilers will provide suitable support for these functions,
preferably as built-in functions that translate to one or more preferably as built-in functions that translate to one or more
POWER ISA instructions. Power ISA instructions.
</para> </para>
<para> <para>
Compilers are encouraged, but not required, to provide built-in Compilers are encouraged, but not required, to provide built-in
functions to access individual instructions in the IBM POWER® functions to access individual instructions in the IBM Power®
instruction set architecture. In most cases, each such built-in instruction set architecture. In most cases, each such built-in
function should provide direct access to the underlying function should provide direct access to the underlying
instruction. instruction.
</para> </para>
<para> <para>
However, to ease porting between little-endian (LE) and big-endian However, to ease porting between little-endian (LE) and big-endian
(BE) POWER systems, and between POWER and other platforms, it is (BE) Power systems, and between Power and other platforms, it is
preferable that some built-in functions provide the same semantics preferable that some built-in functions provide the same semantics
on both LE and BE POWER systems, even if this means that the on both LE and BE Power systems, even if this means that the
built-in functions are implemented with different instruction built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of functions provide a set of functions derived from the set of
hardware functions provided by the POWER SIMD instructions. Unlike hardware functions provided by the Power SIMD instructions. Unlike
traditional “hardware intrinsic” built-in functions, no fixed traditional “hardware intrinsic” built-in functions, no fixed
mapping exists between these built-in functions and the generated mapping exists between these built-in functions and the generated
hardware instruction sequence. Rather, the compiler is free to hardware instruction sequence. Rather, the compiler is free to
@ -52,13 +52,13 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
built-in functions. built-in functions.
</para> </para>
<para> <para>
As we've seen, the POWER SIMD instructions operate on groups of 1, As we've seen, the Power SIMD instructions operate on groups of 1,
2, 4, 8, or 16 vector elements at a time in 128-bit registers. On 2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
a big-endian POWER platform, vector elements are loaded from a big-endian Power platform, vector elements are loaded from
memory into a register so that the 0th element occupies the memory into a register so that the 0th element occupies the
high-order bits of the register, and the (N &#8211; 1)th element high-order bits of the register, and the (N &#8211; 1)th element
occupies the low-order bits of the register. This is referred to occupies the low-order bits of the register. This is referred to
as big-endian element order. On a little-endian POWER platform, as big-endian element order. On a little-endian Power platform,
vector elements are loaded from memory such that the 0th element vector elements are loaded from memory such that the 0th element
occupies the low-order bits of the register, and the (N &#8211; occupies the low-order bits of the register, and the (N &#8211;
1)th element occupies the high-order bits. This is referred to as 1)th element occupies the high-order bits. This is referred to as
@ -68,7 +68,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
<note> <note>
<para> <para>
Much of the information in this chapter was formerly part of Much of the information in this chapter was formerly part of
Chapter 6 of the 64-Bit ELF V2 ABI Specification for POWER. Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power.
</para> </para>
</note> </note>


@ -123,7 +123,7 @@ vector double g = (vector double) { 3.5, -24.6 };</programlisting>
For the C and C++ programming languages (and related/derived For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type languages), these data types may be accessed based on the type
names listed in <xref linkend="VIPR.biendian.vectypes" /> when names listed in <xref linkend="VIPR.biendian.vectypes" /> when
POWER SIMD language extensions are enabled using either the Power SIMD language extensions are enabled using either the
<code>vector</code> or <code>__vector</code> keywords. <code>vector</code> or <code>__vector</code> keywords.
</para> </para>
<para> <para>
@ -478,7 +478,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
<title>Vector Operators</title> <title>Vector Operators</title>
<para> <para>
In addition to the dereference and assignment operators, the In addition to the dereference and assignment operators, the
POWER Bi-Endian Vector Programming Model provides the usual Power Bi-Endian Vector Programming Model provides the usual
operators that are valid on pointers; these operators are also operators that are valid on pointers; these operators are also
valid for pointers to vector types. valid for pointers to vector types.
</para> </para>
@ -589,7 +589,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
<section> <section>
<title>Vector Built-In Functions</title> <title>Vector Built-In Functions</title>
<para> <para>
Some of the POWER SIMD hardware instructions refer, implicitly Some of the Power SIMD hardware instructions refer, implicitly
or explicitly, to vector element numbers. For example, the or explicitly, to vector element numbers. For example, the
<code>vspltb</code> instruction has as one of its inputs an <code>vspltb</code> instruction has as one of its inputs an
index into a vector. The element at that index position is to index into a vector. The element at that index position is to
@ -650,7 +650,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
</entry> </entry>
<entry> <entry>
<para> <para>
<emphasis role="bold">Corresponding POWER <emphasis role="bold">Corresponding Power
Instructions</emphasis> Instructions</emphasis>
</para> </para>
</entry> </entry>
@ -761,7 +761,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
(Deprecated)</title> (Deprecated)</title>
<para> <para>
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
for POWER provided for optional compiler support for using for Power provided for optional compiler support for using
big-endian element ordering in little-endian environments. big-endian element ordering in little-endian environments.
This was initially deemed useful for porting certain libraries This was initially deemed useful for porting certain libraries
that assumed big-endian element ordering regardless of the that assumed big-endian element ordering regardless of the

@ -18,12 +18,12 @@
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro"> xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
<!-- Chapter Title goes here. --> <!-- Chapter Title goes here. -->
<title>Introduction to Vector Programming on POWER</title> <title>Introduction to Vector Programming on Power</title>


<section> <section>
<title>A Brief History</title> <title>A Brief History</title>
<para> <para>
The history of vector programming on POWER processors begins The history of vector programming on Power processors begins
with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The
AIM partners developed the Power Vector Media Extension (VMX) to AIM partners developed the Power Vector Media Extension (VMX) to
accelerate multimedia applications, particularly image accelerate multimedia applications, particularly image
@ -87,15 +87,15 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
a VSR can now contain a single 128-bit integer; and starting a VSR can now contain a single 128-bit integer; and starting
with POWER9, a VSR can contain a single 128-bit floating-point with POWER9, a VSR can contain a single 128-bit floating-point
value. The VMX and VSX instruction sets together may be value. The VMX and VSX instruction sets together may be
referred to as the POWER SIMD (single-instruction, referred to as the Power SIMD (single-instruction,
multiple-data) instructions. multiple-data) instructions.
</para> </para>
<section> <section>
<title>Little-Endian Linux</title> <title>Little-Endian Linux</title>
<para> <para>
The POWER architecture has supported operation in either The Power architecture has supported operation in either
big-endian (BE) or little-endian (LE) mode from the big-endian (BE) or little-endian (LE) mode from the
beginning. However, IBM's POWER servers were only shipped beginning. However, IBM's Power servers were only shipped
with big-endian operating systems (AIX, Linux, i5/OS) prior to with big-endian operating systems (AIX, Linux, i5/OS) prior to
the introduction of POWER8. With POWER8, IBM began the introduction of POWER8. With POWER8, IBM began
supporting little-endian Linux distributions for the first supporting little-endian Linux distributions for the first
@ -106,7 +106,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
currently used only for little-endian Linux. currently used only for little-endian Linux.
</para> </para>
<para> <para>
Although POWER has always supported big- and little-endian Although Power has always supported big- and little-endian
memory accesses, the introduction of vector register support memory accesses, the introduction of vector register support
added a layer of complexity to programming for processors added a layer of complexity to programming for processors
operating in different endian modes. Arrays of elements operating in different endian modes. Arrays of elements
@ -137,7 +137,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
<para> <para>
The vector-scalar registers can be addressed with VSX The vector-scalar registers can be addressed with VSX
instructions, for vector and scalar processing of all 64 instructions, for vector and scalar processing of all 64
registers, or with the "classic" POWER floating-point registers, or with the "classic" Power floating-point
instructions to refer to a 32-register subset of these, having instructions to refer to a 32-register subset of these, having
64 bits per register. They can also be addressed with VMX 64 bits per register. They can also be addressed with VMX
instructions to refer to a 32-register subset of 128-bit registers. instructions to refer to a 32-register subset of 128-bit registers.
@ -198,6 +198,16 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
</emphasis> </emphasis>
</para> </para>
</listitem> </listitem>
<listitem>
<para>
<emphasis>Power Instruction Set Architecture</emphasis>,
Version 3.0B Specification.
<emphasis>
<link xlink:href="https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0">https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
</link>
</emphasis>
</para>
</listitem>
<listitem> <listitem>
<para> <para>
<emphasis>Power Vector Library.</emphasis> <emphasis>Power Vector Library.</emphasis>

@ -31,11 +31,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
intrinsics the best way to ensure that the compiler does exactly intrinsics the best way to ensure that the compiler does exactly
what you want? Well, sometimes. But the problem is that the what you want? Well, sometimes. But the problem is that the
best instruction sequence today may not be the best instruction best instruction sequence today may not be the best instruction
sequence tomorrow. As the PowerISA moves forward, new sequence tomorrow. As the Power ISA moves forward, new
instruction capabilities appear, and the old code you wrote can instruction capabilities appear, and the old code you wrote can
easily become obsolete. Then you start having to create easily become obsolete. Then you start having to create
different versions of the code for different levels of the different versions of the code for different levels of the
PowerISA, and it can quickly become difficult to maintain. Power ISA, and it can quickly become difficult to maintain.
</para> </para>
<para> <para>
Most often programmers use vector intrinsics to increase the Most often programmers use vector intrinsics to increase the
@ -141,7 +141,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<para> <para>
This reference provides intrinsics that are guaranteed to be This reference provides intrinsics that are guaranteed to be
portable across compliant compilers. In particular, both the portable across compliant compilers. In particular, both the
GCC and Clang compilers for POWER implement the intrinsics in GCC and Clang compilers for Power implement the intrinsics in
this manual. The compilers may each implement many more this manual. The compilers may each implement many more
intrinsics, but the ones in this manual are the only ones intrinsics, but the ones in this manual are the only ones
guaranteed to be portable. So if you are using an interface not guaranteed to be portable. So if you are using an interface not
@ -151,7 +151,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<para> <para>
There are also other vector APIs that may be of use to you (see There are also other vector APIs that may be of use to you (see
<xref linkend="VIPR.techniques.apis" />). In particular, the <xref linkend="VIPR.techniques.apis" />). In particular, the
POWER Vector Library (see <xref Power Vector Library (see <xref
linkend="VIPR.techniques.pveclib" />) provides additional linkend="VIPR.techniques.pveclib" />) provides additional
portability across compiler versions. portability across compiler versions.
</para> </para>
@ -221,7 +221,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
and will not necessarily perform optimally (although in many and will not necessarily perform optimally (although in many
cases the performance is very good). Using these headers is cases the performance is very good). Using these headers is
often a good first step in porting a library using Intel often a good first step in porting a library using Intel
intrinsics to POWER, after which more detailed rewriting of intrinsics to Power, after which more detailed rewriting of
algorithms is usually desirable for best performance. algorithms is usually desirable for best performance.
</para> </para>
<para> <para>
@ -231,8 +231,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
</para> </para>
</section> </section>
<section xml:id="VIPR.techniques.pveclib"> <section xml:id="VIPR.techniques.pveclib">
<title>The POWER Vector Library (pveclib)</title> <title>The Power Vector Library (pveclib)</title>
<para>The POWER Vector Library, also known as <para>The Power Vector Library, also known as
<code>pveclib</code>, is a separate project available from <code>pveclib</code>, is a separate project available from
github (see <xref linkend="VIPR.intro.links" />). The github (see <xref linkend="VIPR.intro.links" />). The
<code>pveclib</code> project builds on top of the intrinsics <code>pveclib</code> project builds on top of the intrinsics
@ -244,10 +244,10 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<listitem> <listitem>
<para> <para>
Providing equivalent functions across versions of the Providing equivalent functions across versions of the
PowerISA. For example, the <emphasis>Vector Power ISA. For example, the <emphasis>Vector
Multiply-by-10 Unsigned Quadword</emphasis> operation Multiply-by-10 Unsigned Quadword</emphasis> operation
introduced in PowerISA 3.0 (POWER9) can be implemented introduced in Power ISA 3.0 (POWER9) can be implemented
using a few vector instructions on earlier PowerISA using a few vector instructions on earlier Power ISA
versions. versions.
</para> </para>
</listitem> </listitem>
@ -262,7 +262,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<listitem> <listitem>
<para> <para>
Providing higher-order functions not provided directly by Providing higher-order functions not provided directly by
the PowerISA. One example is a vector SIMD implementation the Power ISA. One example is a vector SIMD implementation
for ASCII <code>__isalpha</code> and similar functions. for ASCII <code>__isalpha</code> and similar functions.
Another example is full <code>__int128</code> Another example is full <code>__int128</code>
implementations of <emphasis>Count Leading implementations of <emphasis>Count Leading

@ -15594,15 +15594,28 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref">
<emphasis role="bold">r</emphasis> is set to the value of the <emphasis role="bold">r</emphasis> is set to the value of the
<emphasis>i</emphasis>th bit of the <emphasis>j</emphasis>th byte <emphasis>i</emphasis>th bit of the <emphasis>j</emphasis>th byte
element of <emphasis role="bold">a</emphasis>.</para> element of <emphasis role="bold">a</emphasis>.</para>
<para>
<xref linkend="VIPR.ch-vec.vec_gb" />, taken from the
Power ISA, shows how bits are combined by the
<code>vec_gb</code> intrinsic. Here <code>VR[VRT]</code> is
equivalent to <emphasis role="bold">r</emphasis>, and
<code>VR[VRB]</code> is equivalent to <emphasis
role="bold">a</emphasis>.
</para>
<figure pgwide="1" xml:id="VIPR.ch-vec.vec_gb">
<title>Operation of vec_gb</title>
<mediaobject>
<imageobject>
<imagedata fileref="vgbbd.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<para><emphasis role="bold">Endian considerations:</emphasis> <para><emphasis role="bold">Endian considerations:</emphasis>
The <emphasis role="bold">vec_gb</emphasis> intrinsic function assumes The <emphasis role="bold">vec_gb</emphasis> intrinsic function assumes
big-endian (left-to-right) numbering for both bits and bytes, matching big-endian (left-to-right) numbering for both bits and bytes, matching
the ISA 2.07 <emphasis role="bold">vgbbd</emphasis> instruction. the ISA 2.07 <emphasis role="bold">vgbbd</emphasis> instruction.
</para> </para>
<para><emphasis role="bold">Notes:</emphasis>
<emphasis>Try to get the diagram from the ISA manual to include
here.</emphasis>
</para>
<indexterm> <indexterm>
<primary>vgbbd</primary> <primary>vgbbd</primary>

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

Loading…
Cancel
Save