Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

pull/69/head
Bill Schmidt 4 years ago
parent a37fc120a3
commit 2333bd8a72

@ -80,9 +80,9 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
<code>__vector</code>, <code>__pixel</code>, and <code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code>. These keywords are used to specify vector <code>__bool</code>. These keywords are used to specify vector
data types (<xref linkend="VIPR.ch-data-types" />). Because data types (<xref linkend="VIPR.ch-data-types" />). Because
these identifiers may conflict with keywords in more recent C these identifiers may conflict with keywords in more recent
and C++ language standards, compilers may implement these in one language standards for C and C++, compilers may implement these
of two ways. in one of two ways.
</para> </para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
@ -104,6 +104,16 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>
As a motivating example, the <emphasis
role="bold">vector</emphasis> token is used as a type in the
C++ Standard Template Library, and hence cannot be used as an
unrestricted keyword, but can be used in the context-sensitive
implementation. For example, <emphasis role="bold">vector
char</emphasis> is distinct from <emphasis
role="bold">std::vector</emphasis> in the context-sensitive
implementation.
</para>
<para> <para>
Vector literals may be specified using a type cast and a set of Vector literals may be specified using a type cast and a set of
literal initializers in parentheses or braces. For example, literal initializers in parentheses or braces. For example,
@ -129,16 +139,15 @@ vector double g = (vector double) { 3.5, -24.6 };</programlisting>
</para> </para>
<para> <para>
For the C and C++ programming languages (and related/derived For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type languages), the "Power SIMD C Types" listed in the leftmost
names listed in <xref linkend="VIPR.biendian.vectypes" /> when column of <xref linkend="VIPR.biendian.vectypes" /> may be used
Power SIMD language extensions are enabled using either the when Power SIMD language extensions are enabled. Either
<code>vector</code> or <code>__vector</code> keywords. Note <code>vector</code> or <code>__vector</code> may be used in the
that the ELFv2 ABI for Power also includes a <code>vector type name. Note that the ELFv2 ABI for Power also includes a
_Float16</code> data type. However, no Power compilers have yet <code>vector _Float16</code> data type. As of this writing, no
implemented such a type, and it is not clear that this will current compilers for Power have implemented such a type. This
change anytime soon. Thus this document has removed the document does not include that type or any intrinsics related to
<code>vector _Float16</code> data type, and all intrinsics that it.
reference it.
</para> </para>
<para> <para>
For the Fortran language, <xref For the Fortran language, <xref
@ -158,8 +167,8 @@ vector double g = (vector double) { 3.5, -24.6 };</programlisting>
Pointers to vector types are defined like pointers of other Pointers to vector types are defined like pointers of other
C/C++ types. Pointers to vector objects may be defined to have C/C++ types. Pointers to vector objects may be defined to have
const and volatile properties. Pointers to vector objects must const and volatile properties. Pointers to vector objects must
be divisible by 16, as vector objects are always aligned on be addresses divisible by 16, as vector objects are always
quadword (128-bit) boundaries. aligned on quadword (16-byte, or 128-bit) boundaries.
</para> </para>
<para> <para>
The preferred way to access vectors at an application-defined The preferred way to access vectors at an application-defined
@ -172,7 +181,8 @@ vector double g = (vector double) { 3.5, -24.6 };</programlisting>
<emphasis>not</emphasis> be used to access data that is not <emphasis>not</emphasis> be used to access data that is not
aligned at least to a quadword boundary. Built-in functions aligned at least to a quadword boundary. Built-in functions
such as <code>vec_xl</code> and <code>vec_xst</code> are such as <code>vec_xl</code> and <code>vec_xst</code> are
provided for unaligned data access. provided for unaligned data access. Please refer to <xref
linkend="VIPR.biendian.unaligned" /> for an example.
</para> </para>
<para> <para>
One vector type may be cast to another vector type without One vector type may be cast to another vector type without
@ -182,7 +192,8 @@ vector double g = (vector double) { 3.5, -24.6 };</programlisting>
<para> <para>
Compilers are expected to recognize and optimize multiple Compilers are expected to recognize and optimize multiple
operations that can be optimized into a single hardware operations that can be optimized into a single hardware
instruction. For example, a load and splat hardware instruction instruction. For example, a load-and-splat hardware instruction
(such as <emphasis role="bold">lxvdsx</emphasis>)
might be generated for the following sequence: might be generated for the following sequence:
</para> </para>
<programlisting>double *double_ptr; <programlisting>double *double_ptr;
@ -484,35 +495,55 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
</para> </para>
<para> <para>
The traditional C/C++ operators are defined on vector types The traditional C/C++ operators are defined on vector types
with “do all” semantics for unary and binary <code>+</code>, for unary and binary <code>+</code>,
unary and binary &#8211;, binary <code>*</code>, binary unary and binary &#8211;, binary <code>*</code>, binary
<code>%</code>, and binary <code>/</code> as well as the unary <code>%</code>, and binary <code>/</code> as well as the unary
and binary shift, logical and comparison operators, and the and binary shift, logical and comparison operators, and the
ternary <code>?:</code> operator. ternary <code>?:</code> operator. These operators perform their
operations "elementwise" on the base elements of the operands,
as follows.
</para> </para>
<para> <para>
For unary operators, the specified operation is performed on For unary operators, the specified operation is performed on
the corresponding base element of the single operand to derive each base element of the single operand to derive the result
the result value for each vector element of the vector value placed into the corresponding element of the vector
result. The result type of unary operations is the type of the result. The result type of unary operations is the type of the
single input operand. single operand. For example,
</para>
<programlisting>vector signed int a, b;
a = -b;</programlisting>
<para>
produces the same result as
</para> </para>
<programlisting>vector signed int a, b;
a = vec_neg (b);</programlisting>
<para> <para>
For binary operators, the specified operation is performed on For binary operators, the specified operation is performed on
the corresponding base elements of both operands to derive the corresponding base elements of both operands to derive the
result value for each vector element of the vector result value for each vector element of the vector result. Both
result. Both operands of the binary operators must have the operands of the binary operators must have the same vector type
same vector type with the same base element type. The result with the same base element type. The result of binary operators
of binary operators is the same type as the type of the input is the same type as the type of the operands. For example,
operands. </para>
</para> <programlisting>vector signed int a, b;
a = a + b;</programlisting>
<para>
produces the same result as
</para>
<programlisting>vector signed int a, b;
a = vec_add (a, b);</programlisting>
<para> <para>
Further, the array reference operator may be applied to vector Further, the array reference operator may be applied to vector
data types, yielding an l-value corresponding to the specified data types, yielding an l-value corresponding to the specified
element in accordance with the vector element numbering rules (see element in accordance with the vector element numbering rules (see
<xref linkend="VIPR.biendian.layout" />). An l-value may either <xref linkend="VIPR.biendian.layout" />). An l-value may either
be assigned a new value or accessed for reading its value. be assigned a new value or accessed for reading its value. For
example,
</para> </para>
<programlisting>vector signed int a;
signed int b, c;
b = a[0];
a[3] = c;</programlisting>
</section> </section>


<section xml:id="VIPR.biendian.layout"> <section xml:id="VIPR.biendian.layout">
@ -584,6 +615,12 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
</tbody> </tbody>
</tgroup> </tgroup>
</informaltable> </informaltable>
<para>
This is no longer as useful as it once was. The primary use
case was for big-endian vector layout in little-endian
environments, which is now deprecated as discussed in <xref
linkend="VIPR.biendian.BELE" />.
</para>
<note> <note>
<para> <para>
Note that each element in a vector has the same representation Note that each element in a vector has the same representation
@ -632,7 +669,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
compiler implementation for both BE and LE. These sample compiler implementation for both BE and LE. These sample
implementations are only intended as examples; designers of a implementations are only intended as examples; designers of a
compiler are free to use other methods to implement the compiler are free to use other methods to implement the
specified semantics as they see fit. specified semantics.
</para> </para>
<section> <section>
<title>Extended Data Movement Functions</title> <title>Extended Data Movement Functions</title>
@ -642,7 +679,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
store instructions and provide access to the “auto-aligning” store instructions and provide access to the “auto-aligning”
memory instructions of the VMX ISA where low-order address memory instructions of the VMX ISA where low-order address
bits are discarded before performing a memory access. These bits are discarded before performing a memory access. These
instructions access load and store data in accordance with the instructions load and store data in accordance with the
program's current endian mode, and do not need to be adapted program's current endian mode, and do not need to be adapted
by the compiler to reflect little-endian operation during code by the compiler to reflect little-endian operation during code
generation. generation.
@ -744,31 +781,31 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
</tgroup> </tgroup>
</table> </table>
<para> <para>
Previous versions of the VMX built-in functions defined Before the bi-endian programming model was introduced, the
intrinsics to access the VMX instructions <code>lvsl</code> <code>vec_lvsl</code> and <code>vec_lvsr</code> intrinsics
and <code>lvsr</code>, which could be used in conjunction with were supported. These could be used in conjunction with
<code>vec_perm</code> and VMX load and store instructions for <code>vec_perm</code> and VMX load and store instructions for
unaligned access. The <code>vec_lvsl</code> and unaligned access. The <code>vec_lvsl</code> and
<code>vec_lvsr</code> interfaces are deprecated in accordance <code>vec_lvsr</code> interfaces are deprecated in accordance
with the interfaces specified here. For compatibility, the with the interfaces specified here. For compatibility, the
built-in pseudo sequences published in previous VMX documents built-in pseudo sequences published in previous VMX documents
continue to work with little-endian data layout and the continue to work with little-endian data layout and the
little-endian vector layout described in this little-endian vector layout described in this document.
document. However, the use of these sequences in new code is However, the use of these sequences in new code is discouraged
discouraged and usually results in worse performance. It is and usually results in worse performance. It is recommended
recommended (but not required) that compilers issue a warning that compilers issue a warning when these functions are used
when these functions are used in little-endian in little-endian environments.
environments.
</para> </para>
<para> <para>
It is recommended that programmers use the <code>vec_xl</code> Instead, it is recommended that programmers use the
and <code>vec_xst</code> vector built-in functions to access <code>vec_xl</code> and <code>vec_xst</code> vector built-in
unaligned data streams. See the descriptions of these functions to access unaligned data streams. See the
instructions in <xref linkend="VIPR.vec-ref" /> for further descriptions of these instructions in <xref
description and implementation details. linkend="VIPR.vec-ref" /> for further description and
implementation details.
</para> </para>
</section> </section>
<section> <section xml:id="VIPR.biendian.BELE">
<title>Big-Endian Vector Layout in Little-Endian Environments <title>Big-Endian Vector Layout in Little-Endian Environments
(Deprecated)</title> (Deprecated)</title>
<para> <para>
@ -1047,7 +1084,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>


<section> <section>
<title>Examples and Limitations</title> <title>Examples and Limitations</title>
<section> <section xml:id="VIPR.biendian.unaligned">
<title>Unaligned vector access</title> <title>Unaligned vector access</title>
<para> <para>
A common programming error is to cast a pointer to a base type A common programming error is to cast a pointer to a base type
@ -1070,8 +1107,8 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
<programlisting> int a[4096]; <programlisting> int a[4096];
vector int x = vec_xl (0, a);</programlisting> vector int x = vec_xl (0, a);</programlisting>
</section> </section>
<section> <section xml:id="VIPR.biendian.sld">
<title>vec_sld is not bi-endian</title> <title>vec_sld and vec_sro are not bi-endian</title>
<para> <para>
One oddity in the bi-endian vector programming model is that One oddity in the bi-endian vector programming model is that
<code>vec_sld</code> has big-endian semantics for code <code>vec_sld</code> has big-endian semantics for code
@ -1099,7 +1136,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
<code>vec_sro</code> is not bi-endian for similar reasons. <code>vec_sro</code> is not bi-endian for similar reasons.
</para> </para>
</section> </section>
<section> <section xml:id="VIPR.biendian.vperm">
<title>Limitations on bi-endianness of vec_perm</title> <title>Limitations on bi-endianness of vec_perm</title>
<para> <para>
The <code>vec_perm</code> intrinsic is bi-endian, provided The <code>vec_perm</code> intrinsic is bi-endian, provided

@ -72,8 +72,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
</para> </para>
<para> <para>
IBM extended VMX by introducing the Vector-Scalar Extension IBM extended VMX by introducing the Vector-Scalar Extension
(VSX) for the POWER7 family of processors. VSX adds 64 logical (VSX) for the POWER7 family of processors. VSX adds sixty-four
Vector Scalar Registers (VSRs); however, to optimize the amount 128-bit vector-scalar registers (VSRs); however, to optimize the amount
of per-process register state, the registers overlap with the of per-process register state, the registers overlap with the
VRs and the scalar floating-point registers (FPRs) (see <xref VRs and the scalar floating-point registers (FPRs) (see <xref
linkend="VIPR.intro.unified" />). The VSRs can represent all linkend="VIPR.intro.unified" />). The VSRs can represent all
@ -88,7 +88,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
Both the VMX and VSX instruction sets have been expanded for the Both the VMX and VSX instruction sets have been expanded for the
POWER8 and POWER9 processor families. Starting with POWER8, POWER8 and POWER9 processor families. Starting with POWER8,
a VSR can now contain a single 128-bit integer; and starting a VSR can now contain a single 128-bit integer; and starting
with POWER9, a VSR can contain a single 128-bit floating-point with POWER9, a VSR can contain a single 128-bit IEEE floating-point
value. Again, the ISA currently only supports 128-bit value. Again, the ISA currently only supports 128-bit
operations on values in the VRs. operations on values in the VRs.
</para> </para>
@ -263,6 +263,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
</emphasis> </emphasis>
</para> </para>
</listitem> </listitem>
<listitem>
<para>
<emphasis>POWER8 Processor User's Manual for the Single-Chip
Module.</emphasis>
<emphasis>
<link xlink:href="https://ibm.ent.box.com/s/649rlau0zjcc0yrulqf4cgx5wk3pgbfk">https://ibm.ent.box.com/s/649rlau0zjcc0yrulqf4cgx5wk3pgbfk
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>POWER9 Processor User's Manual.</emphasis>
<emphasis>
<link
xlink:href="https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k">https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k
</link>
</emphasis>
</para>
</listitem>
<listitem> <listitem>
<para> <para>
<emphasis>Power Vector Library.</emphasis> <emphasis>Power Vector Library.</emphasis>
@ -272,6 +292,17 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
</emphasis> </emphasis>
</para> </para>
</listitem> </listitem>
<listitem>
<para>
<emphasis>POWER8 In-Core Cryptography: The Unofficial
Guide.</emphasis>
<emphasis>
<link
xlink:href="https://github.com/noloader/POWER8-crypto/blob/master/power8-crypto.pdf">https://github.com/noloader/POWER8-crypto/blob/master/power8-crypto.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem> <listitem>
<para> <para>
<emphasis>Using the GNU Compiler Collection.</emphasis> <emphasis>Using the GNU Compiler Collection.</emphasis>

@ -113,7 +113,9 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
references. (<code>restrict</code> can be used only in C references. (<code>restrict</code> can be used only in C
when compiling for the C99 standard or later. when compiling for the C99 standard or later.
<code>__restrict__</code> is a language extension, available <code>__restrict__</code> is a language extension, available
in both GCC and Clang, that can be used for both C and C++.) in GCC, Clang, and the XL compilers, that can be used
without restriction for both C and C++. See your compiler's
user manual for details.)
</para> </para>
<para> <para>
Suppose you have a function that takes two pointer Suppose you have a function that takes two pointer
@ -159,8 +161,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<xref linkend="VIPR.techniques.apis" />). In particular, the <xref linkend="VIPR.techniques.apis" />). In particular, the
Power Vector Library (see <xref Power Vector Library (see <xref
linkend="VIPR.techniques.pveclib" />) provides additional linkend="VIPR.techniques.pveclib" />) provides additional
portability across compiler versions, as well as interfaces that portability across compiler and ISA versions, as well as
hide cases where assembly language is needed. interfaces that hide cases where assembly language is needed.
</para> </para>
</section> </section>


@ -202,7 +204,10 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
responsible for following the calling conventions established by responsible for following the calling conventions established by
the ABI (see <xref linkend="VIPR.intro.links" />). Again, it is the ABI (see <xref linkend="VIPR.intro.links" />). Again, it is
best to look at examples. One place to find well-written best to look at examples. One place to find well-written
<code>.S</code> files is in the GLIBC project. <code>.S</code> files is in the GLIBC project. You can also
study the assembly output from your favorite compiler, which can
be obtained with the <code>-S</code> or similar option, or by
using the <emphasis role="bold">objdump</emphasis> utility.
</para> </para>
</section> </section>


@ -214,13 +219,15 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<section> <section>
<title>x86 Vector Portability Headers</title> <title>x86 Vector Portability Headers</title>
<para> <para>
Recent versions of the GCC and Clang open source compilers Recent versions of the GCC and Clang open-source compilers
provide "drop-in" portability headers for portions of the for Power provide "drop-in" portability headers for portions
Intel Architecture Instruction Set Extensions (see <xref of the Intel Architecture Instruction Set Extensions (see <xref
linkend="VIPR.intro.links" />). These headers mirror the APIs linkend="VIPR.intro.links" />). These headers mirror the APIs
of Intel headers having the same names. Support is provided of Intel headers having the same names. As of this writing,
for the MMX and SSE layers, up through SSE4. At this time, no support is provided for the MMX and SSE layers, up through
support for the AVX layers is envisioned. SSE3 and portions of SSE4. No support for the AVX layers is
envisioned. The portability headers are available starting
with GCC 8.1 and Clang 9.0.0.
</para> </para>
<para> <para>
The portability headers provide the same semantics as the The portability headers provide the same semantics as the

File diff suppressed because it is too large Load Diff
Loading…
Cancel
Save