You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

418 lines
17 KiB

<!--
Copyright (c) 2019 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
<!-- Chapter Title goes here. -->
<title>Introduction to Vector Programming on Power</title>
<section>
<title>A Brief History</title>
<para>
The history of vector programming on <phrase
revisionflag="changed"><trademark
class="registered">Power</trademark></phrase> processors begins
with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The
AIM partners developed the Power Vector Media Extension (VMX) to
accelerate multimedia applications, particularly image
processing. VMX is the name still used by IBM for this
instruction set. Freescale (formerly Motorola) used the
trademark <phrase revisionflag="changed"><trademark
class="trade">AltiVec</trademark>,</phrase> while Apple at one
time called it "Velocity
Engine." While VMX remains the most official name, the term
AltiVec is still in common use today. Freescale's AltiVec
Technology Programming Interface Manual (the "AltiVec PIM") is
still available online for reference (see <xref
linkend="VIPR.intro.links" />).
</para>
<para>
The original VMX specification provided for thirty-two 128-bit
vector registers (VRs). Each register can be treated as
containing sixteen 8-bit character values, eight 16-bit short
integer values, four 32-bit integer values, or four 32-bit
single-precision floating-point values (the "VMX data types").
Furthermore, the integer data types have signed, unsigned, and
boolean variants. An extensive set of arithmetic, logical,
comparison, conversion, memory access, and permute class
operations were specified to operate on these registers.
</para>
<para>
The AltiVec PIM documents intrinsic functions to be used by
programmers to access the VMX instruction set. Because similar
operations are provided for all the VMX data types, the PIM
provides for overloaded intrinsics that can operate on different
data types. However, such function overloading is not normally
acceptable in the C programming language, so compilers compliant
with the AltiVec PIM (such as GCC and Clang) were required to
add special handling to their parsers to permit this. The PIM
suggested (but did not mandate) the use of a header file,
<code>&lt;altivec.h&gt;</code>, for implementations that provide
AltiVec intrinsics. This is common practice for all compliant
compilers today.
</para>
<para>
The first chips incorporating the VMX instruction set were
introduced by Freescale in 1999, and used primarly in Apple
desktop computers. IBM's last desktop CPU (the PowerPC 970)
also included AltiVec support, and was used in the Apple
PowerMac G5. IBM initially omitted support for VMX from its
server-class computers, but added support for it in the POWER6
<phrase revisionflag="added">processor-based</phrase> server
family.
</para>
<para>
IBM extended VMX by introducing the Vector-Scalar Extension
(VSX) for the <phrase revisionflag="changed"><trademark
class="registered">POWER7</trademark></phrase> family of
processors. VSX adds sixty-four
128-bit vector-scalar registers (VSRs); however, to optimize the amount
of per-process register state, the registers overlap with the
VRs and the scalar floating-point registers (FPRs) (see <xref
linkend="VIPR.intro.unified" />). The VSRs can represent all
the data types representable by the VRs, and can also be treated
as containing two 64-bit integers or two 64-bit double-precision
floating-point values. However, ISA support for two 64-bit
integers in VSRs was limited until Version 2.07 (<phrase
revisionflag="changed"><trademark
class="registered">POWER8</trademark></phrase>) of the
Power ISA, and only the VRs are supported for these
instructions.
</para>
<para>
Both the VMX and VSX instruction sets have been expanded for the
<phrase revisionflag="changed">POWER8, <trademark
class="registered">POWER9</trademark>, and <trademark
class="registered">Power10</trademark></phrase> processor
families. Starting with POWER8,
a VSR can now contain a single 128-bit integer; and starting
with POWER9, a VSR can contain a single 128-bit IEEE floating-point
value. Again, the ISA currently only supports 128-bit
operations on values in the VRs.
</para>
<para>
The VMX and VSX instruction sets together may be referred to as
the Power SIMD (single-instruction, multiple-data)
instructions.
</para>
<section>
<title>Little-Endian Linux</title>
<para>
The Power architecture has supported operation in either
big-endian (BE) or little-endian (LE) mode from the
beginning. However, IBM's Power servers were only shipped
with big-endian operating systems (<phrase
revisionflag="changed"><trademark
class="registered">AIX</trademark>, IBM i, <trademark
class="registered">Linux</trademark></phrase>) prior to
the introduction of POWER8. With POWER8, IBM began
supporting little-endian Linux distributions for the first
time, and introduced a new application binary interface (the
64-Bit ELFv2 ABI Specification <xref
linkend="VIPR.intro.links" />) that can be used for either
big- or little-endian support. In practice, the ELFv2 ABI is
currently used only for little-endian Linux.
</para>
<para>
Although Power has always supported big- and little-endian
memory accesses, the introduction of vector register support
added a layer of complexity to programming for processors
operating in different endian modes. Arrays of elements
loaded into a VR or VSR will be indexed from left to right in
the register in big-endian mode, but will be indexed from
right to left in the register in little-endian mode. However,
the VMX and VSX instructions originally assumed that elements
will always be indexed from left to right in the register.
This is an inconvenience that needs to be hidden from the
application programmer wherever possible. To this end, IBM
developed a bi-endian vector programming model (see <xref
linkend="VIPR.biendian" />). The intrinsic functions provided
for the bi-endian vector programming model are described in
<xref linkend="VIPR.vec-ref" />.
</para>
</section>
</section>
<section xml:id="VIPR.intro.unified">
<title>The Unified Vector Register Set</title>
<para>
In <phrase revisionflag="changed"><trademark
class="trade">OpenPOWER</trademark>-compliant</phrase>
processors, floating-point and vector
operations are implemented using a unified vector-scalar model.
As shown in <xref linkend="FPR-VSR" /> and <xref
linkend="VR-VSR" />, there are 64 vector-scalar registers; each
is 128 bits wide.
</para>
<figure pgwide="1" xml:id="FPR-VSR">
<title>Floating-Point Registers as Part of VSRs</title>
<mediaobject>
<imageobject>
<imagedata fileref="fig-fpr-vsr.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<figure pgwide="1" xml:id="VR-VSR">
<title>Vector Registers as Part of VSRs</title>
<mediaobject>
<imageobject>
<imagedata fileref="fig-vr-vsr.png" format="PNG"
scalefit="1" width="100%" />
</imageobject>
</mediaobject>
</figure>
<para>
The vector-scalar registers can be addressed with VSX
instructions, for vector and scalar processing of all 64
registers, or with the "classic" Power floating-point
instructions to refer to a 32-register subset of these, having
64 bits per register. They can also be addressed with VMX
instructions to refer to a 32-register subset of 128-bit registers.
</para>
</section>
<section xml:id="VIPR.intro.reporting">
<title>Where to Report Bugs</title>
<para>
This reference provides guidance on using vector intrinsics that
are supported by all compatible compilers. If you find a
problem when using one of the intrinsics with a compatible
compiler, please report a bug! Bug reporting procedures differ
depending on which compiler you're using.
</para>
<itemizedlist>
<listitem>
<para>
<emphasis role="underline">GCC</emphasis>. The reporting
procedure for bugs against the GNU Compiler Collection is
described at <link
xlink:href="https://gcc.gnu.org/bugs/">https://gcc.gnu.org/bugs/</link>.
The GCC bugzilla tracker is located at <link
xlink:href="https://gcc.gnu.org/bugzilla/">https://gcc.gnu.org/bugzilla/</link>.
</para>
</listitem>
<listitem>
<para>
<emphasis role="underline">Clang/LLVM</emphasis>. The
reporting procedure for bugs against the Clang compiler is
described at <link
xlink:href="https://llvm.org/docs/HowToSubmitABug.html">https://llvm.org/docs/HowToSubmitABug.html</link>.
The LLVM bug tracking system is located at <link
xlink:href="https://bugs.llvm.org/enter_bug.cgi">https://bugs.llvm.org/enter_bug.cgi</link>.
</para>
</listitem>
<listitem>
<para>
<emphasis role="underline">The XL <phrase
revisionflag="added">and Open XL</phrase>
compilers</emphasis>. For XL <phrase
revisionflag="added">and Open XL</phrase> compilers provided
with the Linux Community Edition, you can provide feedback
to the XL compiler team via email
(<email>compinfo@cn.ibm.com</email>); for other editions of
XL <phrase revisionflag="added">and Open XL</phrase>
compilers, please open a <link
xlink:href="https://www.ibm.com/mysupport/s/">Case</link>.
</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="VIPR.intro.links">
<title>Useful Links</title>
<para>
The following documents provide additional reference materials.
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>64-Bit ELF V2 ABI Specification - Power
Architecture.</emphasis>
<emphasis>
<link xlink:href="https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture">https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>AltiVec Technology Program Interface
Manual.</emphasis>
<emphasis>
<!-- Note: the %2e substitution for the "." in the URL is needed to work around unknown issues
in the link creation in PDF documents -->
<link xlink:href="https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM%2epdf">https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Intel Architecture Instruction Set Extensions and
Future Features Programming Reference.</emphasis>
<emphasis>
<!-- Note: the %2e substitution for the "." in the URL is needed to work around unknown issues
in the link creation in PDF documents -->
<link xlink:href="https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference%2epdf">https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Power Instruction Set Architecture</emphasis>,
Version 3.0B Specification.
<emphasis>
<link xlink:href="https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0">https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
</link>
</emphasis>
<phrase revisionflag="added">
Need to update this to Version 3.1B, which is not yet published.
</phrase>
</para>
</listitem>
<listitem>
<para>
<emphasis>POWER8 Processor User's Manual for the Single-Chip
Module.</emphasis>
<emphasis>
<link xlink:href="https://ibm.ent.box.com/s/649rlau0zjcc0yrulqf4cgx5wk3pgbfk">https://ibm.ent.box.com/s/649rlau0zjcc0yrulqf4cgx5wk3pgbfk
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>POWER9 Processor User's Manual.</emphasis>
<emphasis>
<link
xlink:href="https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k">https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k
</link>
</emphasis>
</para>
</listitem>
<listitem revisionflag="added">
<para>
<emphasis>Power10 Processor User's Manual.</emphasis>
<emphasis>
<link
xlink:href="https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k">Not
yet available, link is to P9 user's manual
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Power Vector Library.</emphasis>
<emphasis>
<link xlink:href="https://github.com/open-power-sdk/pveclib">https://github.com/open-power-sdk/pveclib
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>POWER8 In-Core Cryptography: The Unofficial
Guide.</emphasis>
<emphasis>
<link
xlink:href="https://github.com/noloader/POWER8-crypto/blob/master/power8-crypto.pdf">https://github.com/noloader/POWER8-crypto/blob/master/power8-crypto.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>Using the GNU Compiler Collection.</emphasis>
<emphasis>
<!-- Note: the %2e substitution for the "." in the URL is needed to work around unknown issues
in the link creation in PDF documents -->
<link xlink:href="https://gcc.gnu.org/onlinedocs/gcc%2epdf">https://gcc.gnu.org/onlinedocs/gcc.pdf
</link>
</emphasis>
</para>
</listitem>
<listitem>
<para>
<emphasis>GCC's Assembler Syntax.</emphasis> Felix Cloutier.
<emphasis>
<link xlink:href="https://www.felixcloutier.com/documents/gcc-asm.html">https://www.felixcloutier.com/documents/gcc-asm.html</link>
</emphasis>
</para>
</listitem>
<listitem revisionflag="added">
<para>
<emphasis>The GNU C Library Project.</emphasis>
<emphasis>
<link xlink:href="https://www.gnu.org/software/libc">https://www.gnu.org/software/libc</link>
</emphasis>
</para>
</listitem>
<listitem revisionflag="added">
<para>
<emphasis>Matrix-Multiply Assist Best Practices Guide.</emphasis>
<emphasis>
<link xlink:href="http://www.redbooks.ibm.com/redpapers/pdfs/redp5612.pdf">https://www.redbooks.ibm.com/redpapers/pdfs/redp5612.pdf</link>
</emphasis>
</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="VIPR.intro.marks" revisionflag="added">
<title>Trademarks</title>
<para>
AIX, POWER7, POWER8, POWER9, and Power10 are trademarks or
registered trademarks of International Business Machines
Corporation. Linux is a registered trademark of Linus
Torvalds. Intel is a registered trademark of Intel Corporation
or its subsidiaries. AltiVec is a trademark of Freescale
Semiconductor, Inc.
</para>
</section>
<section xml:id="VIPR.intro.conf">
<title>Conformance to this Specification</title>
<orderedlist>
<listitem>
<para>
Vector programs on OpenPOWER systems should follow the guide
and best practices for vector programming as outlined in
<xref linkend="VIPR.biendian" /> and in <xref
linkend="section_techniques" />.
</para>
</listitem>
<listitem>
<para>
Compliant compilers on OpenPOWER systems should provide
suitable support for intrinsic functions, preferably as
built-in vector functions that translate to one or more
Power ISA instructions as described in <xref
linkend="VIPR.biendian" /> and in <xref
linkend="VIPR.vec-ref" />. Compliant compilers targeting a
supported ISA level (2.7 or 3.0, for example) should provide
support for all intrinsic functions valid for that ISA
level, except where an intrinsic function is marked as
phased in, deferred, or deprecated.
</para>
</listitem>
</orderedlist>
</section>
</chapter>