Significant updates to chapters 1-3. Delete old outline file.

pull/69/head
Bill Schmidt 5 years ago
parent 57b40b4d84
commit c086fbb288

@ -22,11 +22,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">


<para> <para>
To ensure portability of applications optimized to exploit the To ensure portability of applications optimized to exploit the
SIMD functions of POWER ISA processors, the ELF V2 ABI defines a SIMD functions of POWER ISA processors, this reference defines a
set of functions and data types for SIMD programming. ELF set of functions and data types for SIMD programming. Compliant
V2-compliant compilers will provide suitable support for these compilers will provide suitable support for these functions,
functions, preferably as built-in functions that translate to one preferably as built-in functions that translate to one or more
or more POWER ISA instructions. POWER ISA instructions.
</para> </para>
<para> <para>
Compilers are encouraged, but not required, to provide built-in Compilers are encouraged, but not required, to provide built-in
@ -43,27 +43,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
built-in functions are implemented with different instruction built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of functions provide a set of functions derived from the set of
hardware functions provided by the Power vector SIMD hardware functions provided by the POWER SIMD instructions. Unlike
instructions. Unlike traditional “hardware intrinsic” built-in traditional “hardware intrinsic” built-in functions, no fixed
functions, no fixed mapping exists between these built-in mapping exists between these built-in functions and the generated
functions and the generated hardware instruction sequence. Rather, hardware instruction sequence. Rather, the compiler is free to
the compiler is free to generate optimized instruction sequences generate optimized instruction sequences that implement the
that implement the semantics of the program specified by the semantics of the program specified by the programmer using these
programmer using these built-in functions. built-in functions.
</para> </para>
<para> <para>
This is primarily applicable to the POWER SIMD instructions. As As we've seen, the POWER SIMD instructions operate on groups of 1,
we've seen, this set of instructions operates on groups of 2, 4, 2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
8, or 16 vector elements at a time in 128-bit registers. On a a big-endian POWER platform, vector elements are loaded from
big-endian POWER platform, vector elements are loaded from memory memory into a register so that the 0th element occupies the
into a register so that the 0th element occupies the high-order high-order bits of the register, and the (N &#8211; 1)th element
bits of the register, and the (N &#8211; 1)th element occupies the occupies the low-order bits of the register. This is referred to
low-order bits of the register. This is referred to as big-endian as big-endian element order. On a little-endian POWER platform,
element order. On a little-endian POWER platform, vector elements vector elements are loaded from memory such that the 0th element
are loaded from memory such that the 0th element occupies the occupies the low-order bits of the register, and the (N &#8211;
low-order bits of the register, and the (N &#8211; 1)th element 1)th element occupies the high-order bits. This is referred to as
occupies the high-order bits. This is referred to as little-endian little-endian element order.
element order.
</para> </para>


<note> <note>
@ -74,6 +73,46 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
</note> </note>


<section> <section>
<title>Language Elements</title>
<para>
The C and C++ languages are extended to use new identifiers
<code>vector</code>, <code>pixel</code>, <code>bool</code>,
<code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code>. These keywords are used to specify vector
data types (<xref linkend="VIPR.ch-data-types" />). Because
these identifiers may conflict with keywords in more recent C
and C++ language standards, compilers may implement these in one
of two ways.
</para>
<itemizedlist>
<listitem>
<para>
<code>__vector</code>, <code>__pixel</code>,
<code>__bool</code>, and <code>bool</code> are defined as
keywords, with <code>vector</code> and <code>pixel</code> as
predefined macros that expand to <code>__vector</code> and
<code>__pixel</code>, respectively.
</para>
</listitem>
<listitem>
<para>
<code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code> are defined as keywords in all contexts,
while <code>vector</code>, <code>pixel</code>, and
<code>bool</code> are treated as keywords only within the
context of a type declaration.
</para>
</listitem>
</itemizedlist>
<para>
Vector literals may be specified using a type cast and a set of
literal initializers in parentheses or braces. For example,
</para>
<programlisting>vector int x = (vector int) (4, -1, 3, 6);
vector double g = (vector double) { 3.5, -24.6 };</programlisting>
</section>

<section xml:id="VIPR.ch-data-types">
<title>Vector Data Types</title> <title>Vector Data Types</title>
<para> <para>
Languages provide support for the data types in <xref Languages provide support for the data types in <xref
@ -84,13 +123,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
For the C and C++ programming languages (and related/derived For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type languages), these data types may be accessed based on the type
names listed in <xref linkend="VIPR.biendian.vectypes" /> when names listed in <xref linkend="VIPR.biendian.vectypes" /> when
Power ISA SIMD language extensions are enabled using either the POWER SIMD language extensions are enabled using either the
<code>vector</code> or <code>__vector</code> keywords. [FIXME: <code>vector</code> or <code>__vector</code> keywords.
We haven't talked about these at all. Need to borrow some
description from the AltiVec PIM about the usage of vector,
bool, and pixel, and supplement with the problems this causes
with strict-ANSI C++. Maybe a separate section on "Language
Elements" should precede this one.]
</para> </para>
<para> <para>
For the Fortran language, <xref For the Fortran language, <xref
@ -126,6 +160,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
such as <code>vec_xl</code> and <code>vec_xst</code> are such as <code>vec_xl</code> and <code>vec_xst</code> are
provided for unaligned data access. provided for unaligned data access.
</para> </para>
<para>
One vector type may be cast to another vector type without
restriction. Such a cast is simply a reinterpretation of the
bits, and does not change the data.
</para>
<para> <para>
Compilers are expected to recognize and optimize multiple Compilers are expected to recognize and optimize multiple
operations that can be optimized into a single hardware operations that can be optimized into a single hardware
@ -252,6 +291,21 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
2<superscript>16</superscript> &#8211; 1.</para> 2<superscript>16</superscript> &#8211; 1.</para>
</entry> </entry>
</row> </row>
<row>
<entry>
<para>vector pixel</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 8 halfwords, each interpreted as a 1-bit
channel and three 5-bit channels.</para>
</entry>
</row>
<row> <row>
<entry> <entry>
<para>vector unsigned int</para> <para>vector unsigned int</para>
@ -424,11 +478,9 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
<title>Vector Operators</title> <title>Vector Operators</title>
<para> <para>
In addition to the dereference and assignment operators, the In addition to the dereference and assignment operators, the
Power SIMD Vector Programming API [FIXME: If we're going to use POWER Bi-Endian Vector Programming Model provides the usual
a term like this, let's use it consistently; also, SIMD and operators that are valid on pointers; these operators are also
Vector are redundant] provides the usual operators that are valid for pointers to vector types.
valid on pointers; these operators are also valid for pointers
to vector types.
</para> </para>
<para> <para>
The traditional C/C++ operators are defined on vector types The traditional C/C++ operators are defined on vector types
@ -580,7 +632,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
bits are discarded before performing a memory access. These bits are discarded before performing a memory access. These
instructions access load and store data in accordance with the instructions access load and store data in accordance with the
program's current endian mode, and do not need to be adapted program's current endian mode, and do not need to be adapted
by the compiler to reflect little-endian operating during code by the compiler to reflect little-endian operation during code
generation. generation.
</para> </para>
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem"> <table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
@ -683,7 +735,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
Previous versions of the VMX built-in functions defined Previous versions of the VMX built-in functions defined
intrinsics to access the VMX instructions <code>lvsl</code> intrinsics to access the VMX instructions <code>lvsl</code>
and <code>lvsr</code>, which could be used in conjunction with and <code>lvsr</code>, which could be used in conjunction with
<code>vec_vperm</code> and VMX load and store instructions for <code>vec_perm</code> and VMX load and store instructions for
unaligned access. The <code>vec_lvsl</code> and unaligned access. The <code>vec_lvsl</code> and
<code>vec_lvsr</code> interfaces are deprecated in accordance <code>vec_lvsr</code> interfaces are deprecated in accordance
with the interfaces specified here. For compatibility, the with the interfaces specified here. For compatibility, the
@ -694,12 +746,14 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
discouraged and usually results in worse performance. It is discouraged and usually results in worse performance. It is
recommended (but not required) that compilers issue a warning recommended (but not required) that compilers issue a warning
when these functions are used in little-endian when these functions are used in little-endian
environments. It is recommended that programmers use the environments.
<code>vec_xl</code> and <code>vec_xst</code> vector built-in </para>
functions to access unaligned data streams. See the <para>
descriptions of these instructions in <xref It is recommended that programmers use the <code>vec_xl</code>
linkend="VIPR.vec-ref" /> for further description and and <code>vec_xst</code> vector built-in functions to access
implementation details. unaligned data streams. See the descriptions of these
instructions in <xref linkend="VIPR.vec-ref" /> for further
description and implementation details.
</para> </para>
</section> </section>
<section> <section>

@ -128,12 +128,87 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">


<section xml:id="VIPR.intro.unified"> <section xml:id="VIPR.intro.unified">
<title>The Unified Vector Register Set</title> <title>The Unified Vector Register Set</title>
<para>filler</para> <para>
In OpenPOWER-compliant processors, floating-point and vector
operations are implemented using a unified vector-scalar model.
As shown in <xref linkend="FPR-VSR" /> and <xref
linkend="VR-VSR" />, there are 64 vector-scalar registers; each
is 128 bits wide.
</para>
<para>
The vector-scalar registers can be addressed with VSX
instructions, for vector and scalar processing of all 64
registers, or with the "classic" POWER floating-point
instructions to refer to a 32-register subset of these, having
64 bits per register. They can also be addressed with VMX
instructions to refer to a 32-register subset of 128-bit registers.
</para>
<figure pgwide="1" xml:id="FPR-VSR">
<title>Floating-Point Registers as Part of VSRs</title>
<mediaobject>
<imageobject>
<imagedata fileref="fig-fpr-vsr.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<figure pgwide="1" xml:id="VR-VSR">
<title>Vector Registers as Part of VSRs</title>
<mediaobject>
<imageobject>
<imagedata fileref="fig-vr-vsr.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
</section> </section>


<section xml:id="VIPR.intro.links"> <section xml:id="VIPR.intro.links">
<title>Useful Links</title> <title>Useful Links</title>
<para>filler</para> <para>
The following documents provide additional reference materials.
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>64-Bit ELF V2 ABI Specification - Power
Architecture.</emphasis>
<emphasis>
<link xlink:href="https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture">https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>AltiVec Technology Program Interface
Manual.</emphasis>
<emphasis>
<link xlink:href="https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf">https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Intel Architecture Instruction Set Extensions and
Future Features Programming Reference.</emphasis>
<emphasis>
<link xlink:href="https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf">https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Power Vector Library.</emphasis>
<emphasis>
<link xlink:href="https://github.com/open-power-sdk/pveclib">https://github.com/open-power-sdk/pveclib
</link>
</emphasis>
</para>
</listitem>
</itemizedlist>
</section> </section>


</chapter> </chapter>

@ -1,45 +0,0 @@
<!--
Copyright (c) 2016 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_outline">
<!-- Chapter Title goes here. -->
<title>Notes on what to include</title>

<itemizedlist spacing="compact">
<listitem>
<para>Rewrite the material from ABI Chapter 6</para>
</listitem>
<listitem>
<para>Recommendations for different ways to create efficient vector
code
<itemizedlist spacing="compact">
<listitem>
<para>Portable: C,C++; tricks to help compiler vectorize code</para>
</listitem>
<listitem>
<para>Use intrinsics</para>
</listitem>
<listitem>
<para>Assembly code - not recommended, but if you must</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</itemizedlist>

</chapter>

@ -51,6 +51,92 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<section> <section>
<title>Use Assembly Code Sparingly</title> <title>Use Assembly Code Sparingly</title>
<para>filler</para> <para>filler</para>
<section>
<title>Inline Assembly</title>
<para>filler</para>
</section>
<section>
<title>Assembly Files</title>
<para>filler</para>
</section>
</section>

<section>
<title>Other Vector Programming APIs</title>
<para>In addition to the intrinsic functions provided in this
reference, programmers should be aware of other vector programming
API resources.</para>
<section>
<title>x86 Vector Portability Headers</title>
<para>
Recent versions of the <code>gcc</code> and <code>clang</code>
open source compilers provide "drop-in" portability headers
for portions of the Intel Architecture Instruction Set
Extensions (see <xref linkend="VIPR.intro.links" />). These
headers mirror the APIs of Intel headers having the same
names. Support is provided for the MMX and SSE layers, up
through SSE4. At this time, no support for the AVX layers is
envisioned.
</para>
<para>
The portability headers provide the same semantics as the
corresponding Intel APIs, but using VMX and VSX instructions
to emulate the Intel vector instructions. It should be
emphasized that these headers are provided for portability,
and will not necessarily perform optimally (although in many
cases the performance is very good). Using these headers is
often a good first step in porting a library using Intel
intrinsics to POWER, after which more detailed rewriting of
algorithms is usually desirable for best performance.
</para>
<para>
Access to the portability APIs occurs automatically when
including one of the corresponding Intel header files, such as
<code>&lt;mmintrin.h&gt;</code>.
</para>
</section>
<section>
<title>The POWER Vector Library (pveclib)</title>
<para>The POWER Vector Library, also known as
<code>pveclib</code>, is a separate project available from
github (see <xref linkend="VIPR.intro.links" />). The
<code>pveclib</code> project builds on top of the intrinsics
described in this manual to provide higher-level vector
interfaces that are highly portable. The goals of the project
include:
</para>
<itemizedlist>
<listitem>
<para>
Providing equivalent functions across versions of the
PowerISA. For example, the <emphasis>Vector
Multiply-by-10 Unsigned Quadword</emphasis> operation
introduced in PowerISA 3.0 (POWER9) can be implemented
using a few vector instructions on earlier PowerISA
versions.
</para>
</listitem>
<listitem>
<para>
Providing equivalent functions across compiler versions.
For example, intrinsics provided in later versions of the
compiler can be implemented as inline functions with
inline asm in earlier compiler versions.
</para>
</listitem>
<listitem>
<para>
Providing higher-order functions not provided directly by
the PowerISA. One example is a vector SIMD implementation
for ASCII <code>__isalpha</code> and similar functions.
Another example is full <code>__int128</code>
implementations of <emphasis>Count Leading
Zeroes</emphasis>, <emphasis>Population Count</emphasis>,
and <emphasis>Multiply</emphasis>.
</para>
</listitem>
</itemizedlist>
</section>
</section> </section>


</chapter> </chapter>

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Loading…
Cancel
Save