Significant updates to chapters 1-3. Delete old outline file.

6 years ago · 43c3a2851a
parent 8ca11d40b3
commit 43c3a2851a
6 changed files with 261 additions and 91 deletions
--- a/Intrinsics_Reference/ch_biendian.xml
+++ b/Intrinsics_Reference/ch_biendian.xml
@ -22,11 +22,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">

  <para>
    To ensure portability of applications optimized to exploit the
-    SIMD functions of POWER ISA processors, the ELF V2 ABI defines a
-    set of functions and data types for SIMD programming. ELF
-    V2-compliant compilers will provide suitable support for these
-    functions, preferably as built-in functions that translate to one
-    or more POWER ISA instructions.
+    SIMD functions of POWER ISA processors, this reference defines a
+    set of functions and data types for SIMD programming.  Compliant
+    compilers will provide suitable support for these functions,
+    preferably as built-in functions that translate to one or more
+    POWER ISA instructions.
  </para>
  <para>
    Compilers are encouraged, but not required, to provide built-in
@ -43,27 +43,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
    built-in functions are implemented with different instruction
    sequences for LE and BE. To achieve this, vector built-in
    functions provide a set of functions derived from the set of
-    hardware functions provided by the Power vector SIMD
-    instructions. Unlike traditional “hardware intrinsic” built-in
-    functions, no fixed mapping exists between these built-in
-    functions and the generated hardware instruction sequence. Rather,
-    the compiler is free to generate optimized instruction sequences
-    that implement the semantics of the program specified by the
-    programmer using these built-in functions.
+    hardware functions provided by the POWER SIMD instructions. Unlike
+    traditional “hardware intrinsic” built-in functions, no fixed
+    mapping exists between these built-in functions and the generated
+    hardware instruction sequence. Rather, the compiler is free to
+    generate optimized instruction sequences that implement the
+    semantics of the program specified by the programmer using these
+    built-in functions. 
  </para>
  <para>
-    This is primarily applicable to the POWER SIMD instructions.  As
-    we've seen, this set of instructions operates on groups of 2, 4,
-    8, or 16 vector elements at a time in 128-bit registers. On a
-    big-endian POWER platform, vector elements are loaded from memory
-    into a register so that the 0th element occupies the high-order
-    bits of the register, and the (N &#8211; 1)th element occupies the
-    low-order bits of the register. This is referred to as big-endian
-    element order. On a little-endian POWER platform, vector elements
-    are loaded from memory such that the 0th element occupies the
-    low-order bits of the register, and the (N &#8211; 1)th element
-    occupies the high-order bits. This is referred to as little-endian
-    element order.
+    As we've seen, the POWER SIMD instructions operate on groups of 1,
+    2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
+    a big-endian POWER platform, vector elements are loaded from
+    memory into a register so that the 0th element occupies the
+    high-order bits of the register, and the (N &#8211; 1)th element
+    occupies the low-order bits of the register. This is referred to
+    as big-endian element order. On a little-endian POWER platform,
+    vector elements are loaded from memory such that the 0th element
+    occupies the low-order bits of the register, and the (N &#8211;
+    1)th element occupies the high-order bits. This is referred to as
+    little-endian element order.
  </para> 

  <note>
@ -74,6 +73,46 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
  </note>

  <section>
+    <title>Language Elements</title>
+    <para>
+      The C and C++ languages are extended to use new identifiers
+      <code>vector</code>, <code>pixel</code>, <code>bool</code>,
+      <code>__vector</code>, <code>__pixel</code>, and
+      <code>__bool</code>.  These keywords are used to specify vector
+      data types (<xref linkend="VIPR.ch-data-types" />).  Because
+      these identifiers may conflict with keywords in more recent C
+      and C++ language standards, compilers may implement these in one
+      of two ways.
+    </para>
+    <itemizedlist>
+      <listitem>
+	<para>
+	  <code>__vector</code>, <code>__pixel</code>,
+	  <code>__bool</code>, and <code>bool</code> are defined as
+	  keywords, with <code>vector</code> and <code>pixel</code> as
+	  predefined macros that expand to <code>__vector</code> and
+	  <code>__pixel</code>, respectively.
+	</para>
+      </listitem>
+      <listitem>
+	<para>
+	  <code>__vector</code>, <code>__pixel</code>, and
+	  <code>__bool</code> are defined as keywords in all contexts,
+	  while <code>vector</code>, <code>pixel</code>, and
+	  <code>bool</code> are treated as keywords only within the
+	  context of a type declaration.
+	</para>
+      </listitem>
+    </itemizedlist>
+    <para>
+      Vector literals may be specified using a type cast and a set of
+      literal initializers in parentheses or braces.  For example,
+    </para>
+    <programlisting>vector int x = (vector int) (4, -1, 3, 6);
+vector double g = (vector double) { 3.5, -24.6 };</programlisting>
+  </section>
+
+  <section xml:id="VIPR.ch-data-types">
    <title>Vector Data Types</title>
    <para>
      Languages provide support for the data types in <xref
@ -84,13 +123,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
      For the C and C++ programming languages (and related/derived
      languages), these data types may be accessed based on the type
      names listed in <xref linkend="VIPR.biendian.vectypes" /> when
-      Power ISA SIMD language extensions are enabled using either the
-      <code>vector</code> or <code>__vector</code> keywords.  [FIXME:
-      We haven't talked about these at all.  Need to borrow some
-      description from the AltiVec PIM about the usage of vector,
-      bool, and pixel, and supplement with the problems this causes
-      with strict-ANSI C++.  Maybe a separate section on "Language
-      Elements" should precede this one.]
+      POWER SIMD language extensions are enabled using either the
+      <code>vector</code> or <code>__vector</code> keywords.
    </para> 
    <para>
      For the Fortran language, <xref
@ -126,6 +160,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
      such as <code>vec_xl</code> and <code>vec_xst</code> are
      provided for unaligned data access.
    </para>
+    <para>
+      One vector type may be cast to another vector type without
+      restriction.  Such a cast is simply a reinterpretation of the
+      bits, and does not change the data.
+    </para>
    <para>
      Compilers are expected to recognize and optimize multiple
      operations that can be optimized into a single hardware
@ -252,6 +291,21 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
              2<superscript>16</superscript> &#8211; 1.</para>
            </entry>
          </row>
+          <row>
+            <entry>
+              <para>vector pixel</para>
+            </entry>
+            <entry>
+              <para>16</para>
+            </entry>
+            <entry>
+              <para>Quadword</para>
+            </entry>
+            <entry>
+              <para>Vector of 8 halfwords, each interpreted as a 1-bit
+	      channel and three 5-bit channels.</para>
+            </entry>
+          </row>
          <row>
            <entry>
              <para>vector unsigned int</para>
@ -424,11 +478,9 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
    <title>Vector Operators</title>
    <para>
      In addition to the dereference and assignment operators, the
-      Power SIMD Vector Programming API [FIXME: If we're going to use
-      a term like this, let's use it consistently; also, SIMD and
-      Vector are redundant] provides the usual operators that are
-      valid on pointers; these operators are also valid for pointers
-      to vector types.
+      POWER Bi-Endian Vector Programming Model provides the usual
+      operators that are valid on pointers; these operators are also
+      valid for pointers to vector types.
    </para>
    <para>
      The traditional C/C++ operators are defined on vector types
@ -580,7 +632,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
 	bits are discarded before performing a memory access. These
 	instructions access load and store data in accordance with the
 	program's current endian mode, and do not need to be adapted
-	by the compiler to reflect little-endian operating during code
+	by the compiler to reflect little-endian operation during code
 	generation.
      </para>
      <table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
@ -683,7 +735,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
 	Previous versions of the VMX built-in functions defined
 	intrinsics to access the VMX instructions <code>lvsl</code>
 	and <code>lvsr</code>, which could be used in conjunction with
-	<code>vec_vperm</code> and VMX load and store instructions for
+	<code>vec_perm</code> and VMX load and store instructions for
 	unaligned access. The <code>vec_lvsl</code> and
 	<code>vec_lvsr</code> interfaces are deprecated in accordance
 	with the interfaces specified here. For compatibility, the
@ -694,12 +746,14 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
 	discouraged and usually results in worse performance. It is
 	recommended (but not required) that compilers issue a warning
 	when these functions are used in little-endian
-	environments. It is recommended that programmers use the
-	<code>vec_xl</code> and <code>vec_xst</code> vector built-in
-	functions to access unaligned data streams.  See the
-	descriptions of these instructions in <xref
-	linkend="VIPR.vec-ref" /> for further description and
-	implementation details.
+	environments.
+      </para>
+      <para>
+	It is recommended that programmers use the <code>vec_xl</code>
+	and <code>vec_xst</code> vector built-in functions to access
+	unaligned data streams.  See the descriptions of these
+	instructions in <xref linkend="VIPR.vec-ref" /> for further
+	description and implementation details.
      </para>
    </section>
    <section>
--- a/Intrinsics_Reference/ch_intro.xml
+++ b/Intrinsics_Reference/ch_intro.xml
@ -128,12 +128,87 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">

  <section xml:id="VIPR.intro.unified">
    <title>The Unified Vector Register Set</title>
-    <para>filler</para>
+    <para>
+      In OpenPOWER-compliant processors, floating-point and vector
+      operations are implemented using a unified vector-scalar model.
+      As shown in <xref linkend="FPR-VSR" /> and <xref
+      linkend="VR-VSR" />, there are 64 vector-scalar registers; each
+      is 128 bits wide.
+    </para>
+    <para>
+      The vector-scalar registers can be addressed with VSX
+      instructions, for vector and scalar processing of all 64
+      registers, or with the "classic" POWER floating-point
+      instructions to refer to a 32-register subset of these, having
+      64 bits per register.  They can also be addressed with VMX
+      instructions to refer to a 32-register subset of 128-bit registers.
+    </para>
+    <figure pgwide="1" xml:id="FPR-VSR">
+      <title>Floating-Point Registers as Part of VSRs</title>
+      <mediaobject>
+	<imageobject>
+	  <imagedata fileref="fig-fpr-vsr.png" format="PNG"
+		     scalefit="1" width="100%" />
+	</imageobject>
+      </mediaobject>
+    </figure>
+    <figure pgwide="1" xml:id="VR-VSR">
+      <title>Vector Registers as Part of VSRs</title>
+      <mediaobject>
+	<imageobject>
+	  <imagedata fileref="fig-vr-vsr.png" format="PNG"
+		     scalefit="1" width="100%" />
+	</imageobject>
+      </mediaobject>
+    </figure>
  </section>

  <section xml:id="VIPR.intro.links">
    <title>Useful Links</title>
-    <para>filler</para>
+    <para>
+      The following documents provide additional reference materials.
+    </para>
+    <itemizedlist>
+      <listitem>
+	<para>
+	  <emphasis>64-Bit ELF V2 ABI Specification - Power
+	  Architecture.</emphasis>
+	  <emphasis>
+	    <link xlink:href="https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture">https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
+	    </link>
+	  </emphasis>
+	</para>
+      </listitem>
+      <listitem>
+	<para>
+	  <emphasis>AltiVec Technology Program Interface
+	  Manual.</emphasis>
+	  <emphasis>
+	    <link xlink:href="https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf">https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
+	    </link>
+	  </emphasis>
+	</para>
+      </listitem>
+      <listitem>
+	<para>
+	  <emphasis>Intel Architecture Instruction Set Extensions and
+	  Future Features Programming Reference.</emphasis>
+	  <emphasis>
+	    <link xlink:href="https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf">https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+	    </link>
+	  </emphasis>
+	</para>
+      </listitem>
+      <listitem>
+	<para>
+	  <emphasis>Power Vector Library.</emphasis>
+	  <emphasis>
+	    <link xlink:href="https://github.com/open-power-sdk/pveclib">https://github.com/open-power-sdk/pveclib
+	    </link>
+	  </emphasis>
+	</para>
+      </listitem>
+    </itemizedlist>
  </section>

 </chapter>
--- a/Intrinsics_Reference/ch_outline.xml
+++ b/Intrinsics_Reference/ch_outline.xml
@ -1,45 +0,0 @@
-<!--
-  Copyright (c) 2016 OpenPOWER Foundation
-  
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
-  
-->
-<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
-xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_outline">
-  
-  <!-- Chapter Title goes here. -->
-  <title>Notes on what to include</title>
-
-  <itemizedlist spacing="compact">
-    <listitem>
-      <para>Rewrite the material from ABI Chapter 6</para>
-    </listitem>
-    <listitem>
-      <para>Recommendations for different ways to create efficient vector
-      code
-      <itemizedlist spacing="compact">
-	<listitem>
-	  <para>Portable: C,C++; tricks to help compiler vectorize code</para>
-	</listitem>
-	<listitem>
-	  <para>Use intrinsics</para>
-	</listitem>
-	<listitem>
-	  <para>Assembly code - not recommended, but if you must</para>
-	</listitem>
-      </itemizedlist>
-      </para>
-    </listitem>
-  </itemizedlist>
-
-</chapter>
--- a/Intrinsics_Reference/ch_techniques.xml
+++ b/Intrinsics_Reference/ch_techniques.xml
@ -51,6 +51,92 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
  <section>
    <title>Use Assembly Code Sparingly</title>
    <para>filler</para>
+    <section>
+      <title>Inline Assembly</title>
+      <para>filler</para>
+    </section>
+    <section>
+      <title>Assembly Files</title>
+      <para>filler</para>
+    </section>
+  </section>
+
+  <section>
+    <title>Other Vector Programming APIs</title>
+    <para>In addition to the intrinsic functions provided in this
+    reference, programmers should be aware of other vector programming
+    API resources.</para>
+    <section>
+      <title>x86 Vector Portability Headers</title>
+      <para>
+	Recent versions of the <code>gcc</code> and <code>clang</code>
+	open source compilers provide "drop-in" portability headers
+	for portions of the Intel Architecture Instruction Set
+	Extensions (see <xref linkend="VIPR.intro.links" />).  These
+	headers mirror the APIs of Intel headers having the same
+	names.  Support is provided for the MMX and SSE layers, up
+	through SSE4.  At this time, no support for the AVX layers is
+	envisioned.
+      </para>
+      <para>
+	The portability headers provide the same semantics as the
+	corresponding Intel APIs, but using VMX and VSX instructions
+	to emulate the Intel vector instructions.  It should be
+	emphasized that these headers are provided for portability,
+	and will not necessarily perform optimally (although in many
+	cases the performance is very good).  Using these headers is
+	often a good first step in porting a library using Intel
+	intrinsics to POWER, after which more detailed rewriting of
+	algorithms is usually desirable for best performance.
+      </para>
+      <para>
+	Access to the portability APIs occurs automatically when
+	including one of the corresponding Intel header files, such as
+	<code>&lt;mmintrin.h&gt;</code>.
+      </para>
+    </section>
+    <section>
+      <title>The POWER Vector Library (pveclib)</title>
+      <para>The POWER Vector Library, also known as
+      <code>pveclib</code>, is a separate project available from
+      github (see <xref linkend="VIPR.intro.links" />).  The
+      <code>pveclib</code> project builds on top of the intrinsics
+      described in this manual to provide higher-level vector
+      interfaces that are highly portable.  The goals of the project
+      include:
+      </para>
+      <itemizedlist>
+	<listitem>
+	  <para>
+	    Providing equivalent functions across versions of the
+	    PowerISA.  For example, the <emphasis>Vector
+	    Multiply-by-10 Unsigned Quadword</emphasis> operation
+	    introduced in PowerISA 3.0 (POWER9) can be implemented
+	    using a few vector instructions on earlier PowerISA
+	    versions. 
+	  </para>
+	</listitem>
+	<listitem>
+	  <para>
+	    Providing equivalent functions across compiler versions.
+	    For example, intrinsics provided in later versions of the
+	    compiler can be implemented as inline functions with
+	    inline asm in earlier compiler versions.
+	  </para>
+	</listitem>
+	<listitem>
+	  <para>
+	    Providing higher-order functions not provided directly by
+	    the PowerISA.  One example is a vector SIMD implementation
+	    for ASCII <code>__isalpha</code> and similar functions.
+	    Another example is full <code>__int128</code>
+	    implementations of <emphasis>Count Leading
+	    Zeroes</emphasis>, <emphasis>Population Count</emphasis>,
+	    and <emphasis>Multiply</emphasis>.
+	  </para>
+	</listitem>
+      </itemizedlist>
+    </section>
  </section>

 </chapter>
--- a/Intrinsics_Reference/fig-fpr-vsr.png
+++ b/Intrinsics_Reference/fig-fpr-vsr.png
--- a/Intrinsics_Reference/fig-vr-vsr.png
+++ b/Intrinsics_Reference/fig-vr-vsr.png