Significant updates to chapters 1-3. Delete old outline file.

7 years ago · c086fbb288
parent 57b40b4d84
commit c086fbb288
6 changed files with 261 additions and 91 deletions
--- a/Intrinsics_Reference/ch_biendian.xml
+++ b/Intrinsics_Reference/ch_biendian.xml
@ -22,11 +22,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
  <para>
    To ensure portability of applications optimized to exploit the
-    SIMD functions of POWER ISA processors, the ELF V2 ABI defines a
+    SIMD functions of POWER ISA processors, this reference defines a
-    set of functions and data types for SIMD programming. ELF
+    set of functions and data types for SIMD programming.  Compliant
-    V2-compliant compilers will provide suitable support for these
+    compilers will provide suitable support for these functions,
-    functions, preferably as built-in functions that translate to one
+    preferably as built-in functions that translate to one or more
-    or more POWER ISA instructions.
+    POWER ISA instructions.
  </para>
  <para>
    Compilers are encouraged, but not required, to provide built-in
@ -43,27 +43,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
    built-in functions are implemented with different instruction
    sequences for LE and BE. To achieve this, vector built-in
    functions provide a set of functions derived from the set of
-    hardware functions provided by the Power vector SIMD
+    hardware functions provided by the POWER SIMD instructions. Unlike
-    instructions. Unlike traditional “hardware intrinsic” built-in
+    traditional “hardware intrinsic” built-in functions, no fixed
-    functions, no fixed mapping exists between these built-in
+    mapping exists between these built-in functions and the generated
-    functions and the generated hardware instruction sequence. Rather,
+    hardware instruction sequence. Rather, the compiler is free to
-    the compiler is free to generate optimized instruction sequences
+    generate optimized instruction sequences that implement the
-    that implement the semantics of the program specified by the
+    semantics of the program specified by the programmer using these
-    programmer using these built-in functions.
+    built-in functions. 
-  </para>
+  </para>
-  <para>
+  <para>
-    This is primarily applicable to the POWER SIMD instructions.  As
+    As we've seen, the POWER SIMD instructions operate on groups of 1,
-    we've seen, this set of instructions operates on groups of 2, 4,
+    2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
-    8, or 16 vector elements at a time in 128-bit registers. On a
+    a big-endian POWER platform, vector elements are loaded from
-    big-endian POWER platform, vector elements are loaded from memory
+    memory into a register so that the 0th element occupies the
-    into a register so that the 0th element occupies the high-order
+    high-order bits of the register, and the (N &#8211; 1)th element
-    bits of the register, and the (N &#8211; 1)th element occupies the
+    occupies the low-order bits of the register. This is referred to
-    low-order bits of the register. This is referred to as big-endian
+    as big-endian element order. On a little-endian POWER platform,
-    element order. On a little-endian POWER platform, vector elements
+    vector elements are loaded from memory such that the 0th element
-    are loaded from memory such that the 0th element occupies the
+    occupies the low-order bits of the register, and the (N &#8211;
-    low-order bits of the register, and the (N &#8211; 1)th element
+    1)th element occupies the high-order bits. This is referred to as
-    occupies the high-order bits. This is referred to as little-endian
+    little-endian element order.
    element order.
  </para> 
  <note>
@ -74,6 +73,46 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
  </note>
  <section>
    <title>Language Elements</title>
    <para>
      The C and C++ languages are extended to use new identifiers
      <code>vector</code>, <code>pixel</code>, <code>bool</code>,
      <code>__vector</code>, <code>__pixel</code>, and
      <code>__bool</code>.  These keywords are used to specify vector
      data types (<xref linkend="VIPR.ch-data-types" />).  Because
      these identifiers may conflict with keywords in more recent C
      and C++ language standards, compilers may implement these in one
      of two ways.
    </para>
    <itemizedlist>
      <listitem>
 	<para>
 	  <code>__vector</code>, <code>__pixel</code>,
 	  <code>__bool</code>, and <code>bool</code> are defined as
 	  keywords, with <code>vector</code> and <code>pixel</code> as
 	  predefined macros that expand to <code>__vector</code> and
 	  <code>__pixel</code>, respectively.
 	</para>
      </listitem>
      <listitem>
 	<para>
 	  <code>__vector</code>, <code>__pixel</code>, and
 	  <code>__bool</code> are defined as keywords in all contexts,
 	  while <code>vector</code>, <code>pixel</code>, and
 	  <code>bool</code> are treated as keywords only within the
 	  context of a type declaration.
 	</para>
      </listitem>
    </itemizedlist>
    <para>
      Vector literals may be specified using a type cast and a set of
      literal initializers in parentheses or braces.  For example,
    </para>
    <programlisting>vector int x = (vector int) (4, -1, 3, 6);
 vector double g = (vector double) { 3.5, -24.6 };</programlisting>
  </section>
  <section xml:id="VIPR.ch-data-types">
    <title>Vector Data Types</title>
    <para>
      Languages provide support for the data types in <xref
@ -84,13 +123,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
      For the C and C++ programming languages (and related/derived
      languages), these data types may be accessed based on the type
      names listed in <xref linkend="VIPR.biendian.vectypes" /> when
-      Power ISA SIMD language extensions are enabled using either the
+      POWER SIMD language extensions are enabled using either the
-      <code>vector</code> or <code>__vector</code> keywords.  [FIXME:
+      <code>vector</code> or <code>__vector</code> keywords.
      We haven't talked about these at all.  Need to borrow some
      description from the AltiVec PIM about the usage of vector,
      bool, and pixel, and supplement with the problems this causes
      with strict-ANSI C++.  Maybe a separate section on "Language
      Elements" should precede this one.]
    </para> 
    <para>
      For the Fortran language, <xref
@ -126,6 +160,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
      such as <code>vec_xl</code> and <code>vec_xst</code> are
      provided for unaligned data access.
    </para>
    <para>
      One vector type may be cast to another vector type without
      restriction.  Such a cast is simply a reinterpretation of the
      bits, and does not change the data.
    </para>
    <para>
      Compilers are expected to recognize and optimize multiple
      operations that can be optimized into a single hardware
@ -252,6 +291,21 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
              2<superscript>16</superscript> &#8211; 1.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector pixel</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 8 halfwords, each interpreted as a 1-bit
 	      channel and three 5-bit channels.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector unsigned int</para>
@ -424,11 +478,9 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
    <title>Vector Operators</title>
    <para>
      In addition to the dereference and assignment operators, the
-      Power SIMD Vector Programming API [FIXME: If we're going to use
+      POWER Bi-Endian Vector Programming Model provides the usual
-      a term like this, let's use it consistently; also, SIMD and
+      operators that are valid on pointers; these operators are also
-      Vector are redundant] provides the usual operators that are
+      valid for pointers to vector types.
      valid on pointers; these operators are also valid for pointers
      to vector types.
    </para>
    <para>
      The traditional C/C++ operators are defined on vector types
@ -580,7 +632,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
 	bits are discarded before performing a memory access. These
 	instructions access load and store data in accordance with the
 	program's current endian mode, and do not need to be adapted
-	by the compiler to reflect little-endian operating during code
+	by the compiler to reflect little-endian operation during code
 	generation.
      </para>
      <table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
@ -683,7 +735,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
 	Previous versions of the VMX built-in functions defined
 	intrinsics to access the VMX instructions <code>lvsl</code>
 	and <code>lvsr</code>, which could be used in conjunction with
-	<code>vec_vperm</code> and VMX load and store instructions for
+	<code>vec_perm</code> and VMX load and store instructions for
 	unaligned access. The <code>vec_lvsl</code> and
 	<code>vec_lvsr</code> interfaces are deprecated in accordance
 	with the interfaces specified here. For compatibility, the
@ -694,12 +746,14 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
 	discouraged and usually results in worse performance. It is
 	recommended (but not required) that compilers issue a warning
 	when these functions are used in little-endian
-	environments. It is recommended that programmers use the
+	environments.
-	<code>vec_xl</code> and <code>vec_xst</code> vector built-in
+      </para>
-	functions to access unaligned data streams.  See the
+      <para>
-	descriptions of these instructions in <xref
+	It is recommended that programmers use the <code>vec_xl</code>
-	linkend="VIPR.vec-ref" /> for further description and
+	and <code>vec_xst</code> vector built-in functions to access
-	implementation details.
+	unaligned data streams.  See the descriptions of these
 	instructions in <xref linkend="VIPR.vec-ref" /> for further
 	description and implementation details.
      </para>
    </section>
    <section>
--- a/Intrinsics_Reference/ch_intro.xml
+++ b/Intrinsics_Reference/ch_intro.xml
@ -128,12 +128,87 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
  <section xml:id="VIPR.intro.unified">
    <title>The Unified Vector Register Set</title>
-    <para>filler</para>
+    <para>
      In OpenPOWER-compliant processors, floating-point and vector
      operations are implemented using a unified vector-scalar model.
      As shown in <xref linkend="FPR-VSR" /> and <xref
      linkend="VR-VSR" />, there are 64 vector-scalar registers; each
      is 128 bits wide.
    </para>
    <para>
      The vector-scalar registers can be addressed with VSX
      instructions, for vector and scalar processing of all 64
      registers, or with the "classic" POWER floating-point
      instructions to refer to a 32-register subset of these, having
      64 bits per register.  They can also be addressed with VMX
      instructions to refer to a 32-register subset of 128-bit registers.
    </para>
    <figure pgwide="1" xml:id="FPR-VSR">
      <title>Floating-Point Registers as Part of VSRs</title>
      <mediaobject>
 	<imageobject>
 	  <imagedata fileref="fig-fpr-vsr.png" format="PNG"
 		     scalefit="1" width="100%" />
 	</imageobject>
      </mediaobject>
    </figure>
    <figure pgwide="1" xml:id="VR-VSR">
      <title>Vector Registers as Part of VSRs</title>
      <mediaobject>
 	<imageobject>
 	  <imagedata fileref="fig-vr-vsr.png" format="PNG"
 		     scalefit="1" width="100%" />
 	</imageobject>
      </mediaobject>
    </figure>
  </section>
  <section xml:id="VIPR.intro.links">
    <title>Useful Links</title>
-    <para>filler</para>
+    <para>
      The following documents provide additional reference materials.
    </para>
    <itemizedlist>
      <listitem>
 	<para>
 	  <emphasis>64-Bit ELF V2 ABI Specification - Power
 	  Architecture.</emphasis>
 	  <emphasis>
 	    <link xlink:href="https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture">https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
 	    </link>
 	  </emphasis>
 	</para>
      </listitem>
      <listitem>
 	<para>
 	  <emphasis>AltiVec Technology Program Interface
 	  Manual.</emphasis>
 	  <emphasis>
 	    <link xlink:href="https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf">https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
 	    </link>
 	  </emphasis>
 	</para>
      </listitem>
      <listitem>
 	<para>
 	  <emphasis>Intel Architecture Instruction Set Extensions and
 	  Future Features Programming Reference.</emphasis>
 	  <emphasis>
 	    <link xlink:href="https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf">https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
 	    </link>
 	  </emphasis>
 	</para>
      </listitem>
      <listitem>
 	<para>
 	  <emphasis>Power Vector Library.</emphasis>
 	  <emphasis>
 	    <link xlink:href="https://github.com/open-power-sdk/pveclib">https://github.com/open-power-sdk/pveclib
 	    </link>
 	  </emphasis>
 	</para>
      </listitem>
    </itemizedlist>
  </section>
 </chapter>
--- a/Intrinsics_Reference/ch_outline.xml
+++ b/Intrinsics_Reference/ch_outline.xml
@ -1,45 +0,0 @@
 <!--
  Copyright (c) 2016 OpenPOWER Foundation
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
 xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_outline">
  <!-- Chapter Title goes here. -->
  <title>Notes on what to include</title>
  <itemizedlist spacing="compact">
    <listitem>
      <para>Rewrite the material from ABI Chapter 6</para>
    </listitem>
    <listitem>
      <para>Recommendations for different ways to create efficient vector
      code
      <itemizedlist spacing="compact">
 	<listitem>
 	  <para>Portable: C,C++; tricks to help compiler vectorize code</para>
 	</listitem>
 	<listitem>
 	  <para>Use intrinsics</para>
 	</listitem>
 	<listitem>
 	  <para>Assembly code - not recommended, but if you must</para>
 	</listitem>
      </itemizedlist>
      </para>
    </listitem>
  </itemizedlist>
 </chapter>
--- a/Intrinsics_Reference/ch_techniques.xml
+++ b/Intrinsics_Reference/ch_techniques.xml
@ -51,6 +51,92 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
  <section>
    <title>Use Assembly Code Sparingly</title>
    <para>filler</para>
    <section>
      <title>Inline Assembly</title>
      <para>filler</para>
    </section>
    <section>
      <title>Assembly Files</title>
      <para>filler</para>
    </section>
  </section>
  <section>
    <title>Other Vector Programming APIs</title>
    <para>In addition to the intrinsic functions provided in this
    reference, programmers should be aware of other vector programming
    API resources.</para>
    <section>
      <title>x86 Vector Portability Headers</title>
      <para>
 	Recent versions of the <code>gcc</code> and <code>clang</code>
 	open source compilers provide "drop-in" portability headers
 	for portions of the Intel Architecture Instruction Set
 	Extensions (see <xref linkend="VIPR.intro.links" />).  These
 	headers mirror the APIs of Intel headers having the same
 	names.  Support is provided for the MMX and SSE layers, up
 	through SSE4.  At this time, no support for the AVX layers is
 	envisioned.
      </para>
      <para>
 	The portability headers provide the same semantics as the
 	corresponding Intel APIs, but using VMX and VSX instructions
 	to emulate the Intel vector instructions.  It should be
 	emphasized that these headers are provided for portability,
 	and will not necessarily perform optimally (although in many
 	cases the performance is very good).  Using these headers is
 	often a good first step in porting a library using Intel
 	intrinsics to POWER, after which more detailed rewriting of
 	algorithms is usually desirable for best performance.
      </para>
      <para>
 	Access to the portability APIs occurs automatically when
 	including one of the corresponding Intel header files, such as
 	<code>&lt;mmintrin.h&gt;</code>.
      </para>
    </section>
    <section>
      <title>The POWER Vector Library (pveclib)</title>
      <para>The POWER Vector Library, also known as
      <code>pveclib</code>, is a separate project available from
      github (see <xref linkend="VIPR.intro.links" />).  The
      <code>pveclib</code> project builds on top of the intrinsics
      described in this manual to provide higher-level vector
      interfaces that are highly portable.  The goals of the project
      include:
      </para>
      <itemizedlist>
 	<listitem>
 	  <para>
 	    Providing equivalent functions across versions of the
 	    PowerISA.  For example, the <emphasis>Vector
 	    Multiply-by-10 Unsigned Quadword</emphasis> operation
 	    introduced in PowerISA 3.0 (POWER9) can be implemented
 	    using a few vector instructions on earlier PowerISA
 	    versions. 
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    Providing equivalent functions across compiler versions.
 	    For example, intrinsics provided in later versions of the
 	    compiler can be implemented as inline functions with
 	    inline asm in earlier compiler versions.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    Providing higher-order functions not provided directly by
 	    the PowerISA.  One example is a vector SIMD implementation
 	    for ASCII <code>__isalpha</code> and similar functions.
 	    Another example is full <code>__int128</code>
 	    implementations of <emphasis>Count Leading
 	    Zeroes</emphasis>, <emphasis>Population Count</emphasis>,
 	    and <emphasis>Multiply</emphasis>.
 	  </para>
 	</listitem>
      </itemizedlist>
    </section>
  </section>
 </chapter>
--- a/Intrinsics_Reference/fig-fpr-vsr.png
+++ b/Intrinsics_Reference/fig-fpr-vsr.png
--- a/Intrinsics_Reference/fig-vr-vsr.png
+++ b/Intrinsics_Reference/fig-vr-vsr.png