Significant updates to chapters 1-3. Delete old outline file.

pull/69/head
Bill Schmidt 5 years ago
parent 57b40b4d84
commit c086fbb288

@ -22,11 +22,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">

<para>
To ensure portability of applications optimized to exploit the
SIMD functions of POWER ISA processors, the ELF V2 ABI defines a
set of functions and data types for SIMD programming. ELF
V2-compliant compilers will provide suitable support for these
functions, preferably as built-in functions that translate to one
or more POWER ISA instructions.
SIMD functions of POWER ISA processors, this reference defines a
set of functions and data types for SIMD programming. Compliant
compilers will provide suitable support for these functions,
preferably as built-in functions that translate to one or more
POWER ISA instructions.
</para>
<para>
Compilers are encouraged, but not required, to provide built-in
@ -43,27 +43,26 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of
hardware functions provided by the Power vector SIMD
instructions. Unlike traditional “hardware intrinsic” built-in
functions, no fixed mapping exists between these built-in
functions and the generated hardware instruction sequence. Rather,
the compiler is free to generate optimized instruction sequences
that implement the semantics of the program specified by the
programmer using these built-in functions.
</para>
<para>
This is primarily applicable to the POWER SIMD instructions. As
we've seen, this set of instructions operates on groups of 2, 4,
8, or 16 vector elements at a time in 128-bit registers. On a
big-endian POWER platform, vector elements are loaded from memory
into a register so that the 0th element occupies the high-order
bits of the register, and the (N &#8211; 1)th element occupies the
low-order bits of the register. This is referred to as big-endian
element order. On a little-endian POWER platform, vector elements
are loaded from memory such that the 0th element occupies the
low-order bits of the register, and the (N &#8211; 1)th element
occupies the high-order bits. This is referred to as little-endian
element order.
hardware functions provided by the POWER SIMD instructions. Unlike
traditional “hardware intrinsic” built-in functions, no fixed
mapping exists between these built-in functions and the generated
hardware instruction sequence. Rather, the compiler is free to
generate optimized instruction sequences that implement the
semantics of the program specified by the programmer using these
built-in functions.
</para>
<para>
As we've seen, the POWER SIMD instructions operate on groups of 1,
2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
a big-endian POWER platform, vector elements are loaded from
memory into a register so that the 0th element occupies the
high-order bits of the register, and the (N &#8211; 1)th element
occupies the low-order bits of the register. This is referred to
as big-endian element order. On a little-endian POWER platform,
vector elements are loaded from memory such that the 0th element
occupies the low-order bits of the register, and the (N &#8211;
1)th element occupies the high-order bits. This is referred to as
little-endian element order.
</para>

<note>
@ -74,6 +73,46 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
</note>

<section>
<title>Language Elements</title>
<para>
The C and C++ languages are extended to use new identifiers
<code>vector</code>, <code>pixel</code>, <code>bool</code>,
<code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code>. These keywords are used to specify vector
data types (<xref linkend="VIPR.ch-data-types" />). Because
these identifiers may conflict with keywords in more recent C
and C++ language standards, compilers may implement these in one
of two ways.
</para>
<itemizedlist>
<listitem>
<para>
<code>__vector</code>, <code>__pixel</code>,
<code>__bool</code>, and <code>bool</code> are defined as
keywords, with <code>vector</code> and <code>pixel</code> as
predefined macros that expand to <code>__vector</code> and
<code>__pixel</code>, respectively.
</para>
</listitem>
<listitem>
<para>
<code>__vector</code>, <code>__pixel</code>, and
<code>__bool</code> are defined as keywords in all contexts,
while <code>vector</code>, <code>pixel</code>, and
<code>bool</code> are treated as keywords only within the
context of a type declaration.
</para>
</listitem>
</itemizedlist>
<para>
Vector literals may be specified using a type cast and a set of
literal initializers in parentheses or braces. For example,
</para>
<programlisting>vector int x = (vector int) (4, -1, 3, 6);
vector double g = (vector double) { 3.5, -24.6 };</programlisting>
</section>

<section xml:id="VIPR.ch-data-types">
<title>Vector Data Types</title>
<para>
Languages provide support for the data types in <xref
@ -84,13 +123,8 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type
names listed in <xref linkend="VIPR.biendian.vectypes" /> when
Power ISA SIMD language extensions are enabled using either the
<code>vector</code> or <code>__vector</code> keywords. [FIXME:
We haven't talked about these at all. Need to borrow some
description from the AltiVec PIM about the usage of vector,
bool, and pixel, and supplement with the problems this causes
with strict-ANSI C++. Maybe a separate section on "Language
Elements" should precede this one.]
POWER SIMD language extensions are enabled using either the
<code>vector</code> or <code>__vector</code> keywords.
</para>
<para>
For the Fortran language, <xref
@ -126,6 +160,11 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
such as <code>vec_xl</code> and <code>vec_xst</code> are
provided for unaligned data access.
</para>
<para>
One vector type may be cast to another vector type without
restriction. Such a cast is simply a reinterpretation of the
bits, and does not change the data.
</para>
<para>
Compilers are expected to recognize and optimize multiple
operations that can be optimized into a single hardware
@ -252,6 +291,21 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
2<superscript>16</superscript> &#8211; 1.</para>
</entry>
</row>
<row>
<entry>
<para>vector pixel</para>
</entry>
<entry>
<para>16</para>
</entry>
<entry>
<para>Quadword</para>
</entry>
<entry>
<para>Vector of 8 halfwords, each interpreted as a 1-bit
channel and three 5-bit channels.</para>
</entry>
</row>
<row>
<entry>
<para>vector unsigned int</para>
@ -424,11 +478,9 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
<title>Vector Operators</title>
<para>
In addition to the dereference and assignment operators, the
Power SIMD Vector Programming API [FIXME: If we're going to use
a term like this, let's use it consistently; also, SIMD and
Vector are redundant] provides the usual operators that are
valid on pointers; these operators are also valid for pointers
to vector types.
POWER Bi-Endian Vector Programming Model provides the usual
operators that are valid on pointers; these operators are also
valid for pointers to vector types.
</para>
<para>
The traditional C/C++ operators are defined on vector types
@ -580,7 +632,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
bits are discarded before performing a memory access. These
instructions access load and store data in accordance with the
program's current endian mode, and do not need to be adapted
by the compiler to reflect little-endian operating during code
by the compiler to reflect little-endian operation during code
generation.
</para>
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
@ -683,7 +735,7 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
Previous versions of the VMX built-in functions defined
intrinsics to access the VMX instructions <code>lvsl</code>
and <code>lvsr</code>, which could be used in conjunction with
<code>vec_vperm</code> and VMX load and store instructions for
<code>vec_perm</code> and VMX load and store instructions for
unaligned access. The <code>vec_lvsl</code> and
<code>vec_lvsr</code> interfaces are deprecated in accordance
with the interfaces specified here. For compatibility, the
@ -694,12 +746,14 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
discouraged and usually results in worse performance. It is
recommended (but not required) that compilers issue a warning
when these functions are used in little-endian
environments. It is recommended that programmers use the
<code>vec_xl</code> and <code>vec_xst</code> vector built-in
functions to access unaligned data streams. See the
descriptions of these instructions in <xref
linkend="VIPR.vec-ref" /> for further description and
implementation details.
environments.
</para>
<para>
It is recommended that programmers use the <code>vec_xl</code>
and <code>vec_xst</code> vector built-in functions to access
unaligned data streams. See the descriptions of these
instructions in <xref linkend="VIPR.vec-ref" /> for further
description and implementation details.
</para>
</section>
<section>

@ -128,12 +128,87 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">

<section xml:id="VIPR.intro.unified">
<title>The Unified Vector Register Set</title>
<para>filler</para>
<para>
In OpenPOWER-compliant processors, floating-point and vector
operations are implemented using a unified vector-scalar model.
As shown in <xref linkend="FPR-VSR" /> and <xref
linkend="VR-VSR" />, there are 64 vector-scalar registers; each
is 128 bits wide.
</para>
<para>
The vector-scalar registers can be addressed with VSX
instructions, for vector and scalar processing of all 64
registers, or with the "classic" POWER floating-point
instructions to refer to a 32-register subset of these, having
64 bits per register. They can also be addressed with VMX
instructions to refer to a 32-register subset of 128-bit registers.
</para>
<figure pgwide="1" xml:id="FPR-VSR">
<title>Floating-Point Registers as Part of VSRs</title>
<mediaobject>
<imageobject>
<imagedata fileref="fig-fpr-vsr.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<figure pgwide="1" xml:id="VR-VSR">
<title>Vector Registers as Part of VSRs</title>
<mediaobject>
<imageobject>
<imagedata fileref="fig-vr-vsr.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
</section>

<section xml:id="VIPR.intro.links">
<title>Useful Links</title>
<para>filler</para>
<para>
The following documents provide additional reference materials.
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>64-Bit ELF V2 ABI Specification - Power
Architecture.</emphasis>
<emphasis>
<link xlink:href="https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture">https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>AltiVec Technology Program Interface
Manual.</emphasis>
<emphasis>
<link xlink:href="https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf">https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Intel Architecture Instruction Set Extensions and
Future Features Programming Reference.</emphasis>
<emphasis>
<link xlink:href="https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf">https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Power Vector Library.</emphasis>
<emphasis>
<link xlink:href="https://github.com/open-power-sdk/pveclib">https://github.com/open-power-sdk/pveclib
</link>
</emphasis>
</para>
</listitem>
</itemizedlist>
</section>

</chapter>

@ -1,45 +0,0 @@
<!--
Copyright (c) 2016 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_outline">
<!-- Chapter Title goes here. -->
<title>Notes on what to include</title>

<itemizedlist spacing="compact">
<listitem>
<para>Rewrite the material from ABI Chapter 6</para>
</listitem>
<listitem>
<para>Recommendations for different ways to create efficient vector
code
<itemizedlist spacing="compact">
<listitem>
<para>Portable: C,C++; tricks to help compiler vectorize code</para>
</listitem>
<listitem>
<para>Use intrinsics</para>
</listitem>
<listitem>
<para>Assembly code - not recommended, but if you must</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</itemizedlist>

</chapter>

@ -51,6 +51,92 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
<section>
<title>Use Assembly Code Sparingly</title>
<para>filler</para>
<section>
<title>Inline Assembly</title>
<para>filler</para>
</section>
<section>
<title>Assembly Files</title>
<para>filler</para>
</section>
</section>

<section>
<title>Other Vector Programming APIs</title>
<para>In addition to the intrinsic functions provided in this
reference, programmers should be aware of other vector programming
API resources.</para>
<section>
<title>x86 Vector Portability Headers</title>
<para>
Recent versions of the <code>gcc</code> and <code>clang</code>
open source compilers provide "drop-in" portability headers
for portions of the Intel Architecture Instruction Set
Extensions (see <xref linkend="VIPR.intro.links" />). These
headers mirror the APIs of Intel headers having the same
names. Support is provided for the MMX and SSE layers, up
through SSE4. At this time, no support for the AVX layers is
envisioned.
</para>
<para>
The portability headers provide the same semantics as the
corresponding Intel APIs, but using VMX and VSX instructions
to emulate the Intel vector instructions. It should be
emphasized that these headers are provided for portability,
and will not necessarily perform optimally (although in many
cases the performance is very good). Using these headers is
often a good first step in porting a library using Intel
intrinsics to POWER, after which more detailed rewriting of
algorithms is usually desirable for best performance.
</para>
<para>
Access to the portability APIs occurs automatically when
including one of the corresponding Intel header files, such as
<code>&lt;mmintrin.h&gt;</code>.
</para>
</section>
<section>
<title>The POWER Vector Library (pveclib)</title>
<para>The POWER Vector Library, also known as
<code>pveclib</code>, is a separate project available from
github (see <xref linkend="VIPR.intro.links" />). The
<code>pveclib</code> project builds on top of the intrinsics
described in this manual to provide higher-level vector
interfaces that are highly portable. The goals of the project
include:
</para>
<itemizedlist>
<listitem>
<para>
Providing equivalent functions across versions of the
PowerISA. For example, the <emphasis>Vector
Multiply-by-10 Unsigned Quadword</emphasis> operation
introduced in PowerISA 3.0 (POWER9) can be implemented
using a few vector instructions on earlier PowerISA
versions.
</para>
</listitem>
<listitem>
<para>
Providing equivalent functions across compiler versions.
For example, intrinsics provided in later versions of the
compiler can be implemented as inline functions with
inline asm in earlier compiler versions.
</para>
</listitem>
<listitem>
<para>
Providing higher-order functions not provided directly by
the PowerISA. One example is a vector SIMD implementation
for ASCII <code>__isalpha</code> and similar functions.
Another example is full <code>__int128</code>
implementations of <emphasis>Count Leading
Zeroes</emphasis>, <emphasis>Population Count</emphasis>,
and <emphasis>Multiply</emphasis>.
</para>
</listitem>
</itemizedlist>
</section>
</section>

</chapter>

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Loading…
Cancel
Save