|
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
|
<!--
|
|
|
|
|
Copyright (c) 2017 OpenPOWER Foundation
|
|
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
|
|
-->
|
|
|
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
|
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
|
|
|
version="5.0"
|
|
|
|
|
xml:id="sec_power_vector_scalar_floatingpoint">
|
|
|
|
|
<title>Vector-Scalar Floating-Point Operations (VSX)</title>
|
|
|
|
|
|
|
|
|
|
<para>With PowerISA 2.06 (POWER7) we extended the vector SIMD capabilities
|
|
|
|
|
of the PowerISA:</para>
|
|
|
|
|
|
|
|
|
|
<itemizedlist spacing="compact">
|
|
|
|
|
<listitem>
|
|
|
|
|
<para>Extend the available vector and floating-point scalar register
|
|
|
|
|
sets from 32 registers each to a combined register set of 64 x 64-bit
|
|
|
|
|
scalar floating-point and
|
|
|
|
|
64 x 128-bit vector registers.</para>
|
|
|
|
|
</listitem>
|
|
|
|
|
<listitem>
|
|
|
|
|
<para>Enable scalar double float operations on all 64 scalar
|
|
|
|
|
registers.</para>
|
|
|
|
|
</listitem>
|
|
|
|
|
<listitem>
|
|
|
|
|
<para>Enable vector double and vector float operations for all 64
|
|
|
|
|
vector registers.</para>
|
|
|
|
|
</listitem>
|
|
|
|
|
<listitem>
|
|
|
|
|
<para>Enable super-scalar execution of vector instructions and support
|
|
|
|
|
2 independent vector floating point pipelines for parallel execution of 4 x
|
|
|
|
|
64-bit Floating point Fused Multiply Adds (FMAs) and 8 x 32-bit FMAs per
|
|
|
|
|
cycle.</para>
|
|
|
|
|
</listitem>
|
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
|
|
<para>With PowerISA 2.07 (POWER8) we added single-precision scalar
|
|
|
|
|
floating-point instructions to VSX. This completes the floating-point
|
|
|
|
|
computational set for VSX. This ISA release also clarified how these operate in
|
|
|
|
|
the Little Endian storage model.</para>
|
|
|
|
|
|
|
|
|
|
<para>While the focus was on enhanced floating-point computation (for High
|
|
|
|
|
Performance Computing), VSX also extended the ISA with additional storage
|
|
|
|
|
access, logical, and permute (merge, splat, shift) instructions. This was
|
|
|
|
|
necessary to extend these operations to cover 64 VSX registers, and improves
|
|
|
|
|
unaligned storage access for vectors (not available in VMX).</para>
|
|
|
|
|
|
|
|
|
|
<para>The PowerISA 2.07B Chapter 7. Vector-Scalar Floating-Point Operations
|
|
|
|
|
is organized starting with an introduction and overview (chapters 7.1- 7.5) .
|
|
|
|
|
The early sections (7.1 and 7.2) describe the layout of the 64 VSX registers
|
|
|
|
|
and how they relate (overlap and inter-operate) to the existing floating point
|
|
|
|
|
scalar (FPRs) and vector (VMX VRs) registers.
|
|
|
|
|
|
|
|
|
|
<literallayout><literal>7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 317
|
|
|
|
|
7.1.1 Overview of the Vector-Scalar Extension . . . . . . . . . . . 317
|
|
|
|
|
7.2 VSX Registers . . . . . . . . . . . . . . . . . . . . . . . . . 318
|
|
|
|
|
7.2.1 Vector-Scalar Registers . . . . . . . . . . . . . . . . . . . 318
|
|
|
|
|
7.2.2 Floating-Point Status and Control Register . . . . . . . . . . 321</literal></literallayout></para>
|
|
|
|
|
|
|
|
|
|
<para>The definitions given in “7.1.1.1 Compatibility with Category
|
|
|
|
|
Floating-Point and Category Decimal Floating-Point Operations”, and
|
|
|
|
|
“7.1.1.2 Compatibility with Category Vector Operations”
|
|
|
|
|
<blockquote>
|
|
|
|
|
<para>The instruction sets defined in Chapter 4.
|
|
|
|
|
Floating-Point Facility and Chapter 5. Decimal
|
|
|
|
|
Floating-Point retain their definition with one primary
|
|
|
|
|
difference. The FPRs are mapped to doubleword
|
|
|
|
|
element 0 of VSRs 0-31. The contents of doubleword 1
|
|
|
|
|
of the VSR corresponding to a source FPR specified
|
|
|
|
|
by an instruction are ignored. The contents of
|
|
|
|
|
doubleword 1 of a VSR corresponding to the target
|
|
|
|
|
FPR specified by an instruction are undefined.</para>
|
|
|
|
|
|
|
|
|
|
<para>The instruction set defined in Chapter 6. Vector Facility
|
|
|
|
|
[Category: Vector], retains its definition with one
|
|
|
|
|
primary difference. The VRs are mapped to VSRs
|
|
|
|
|
32-63.</para></blockquote></para>
|
|
|
|
|
|
|
|
|
|
<note><para>The reference to scalar element 0 above is from the big endian
|
|
|
|
|
register perspective of the ISA. In the PPC64LE ABI implementation, and for the
|
|
|
|
|
purpose of porting Intel intrinsics, this is logical doubleword element 1. Intel SSE
|
|
|
|
|
scalar intrinsics operated on logical element [0], which is in the wrong
|
|
|
|
|
position for PowerISA FPU and VSX scalar floating-point operations. Another
|
|
|
|
|
important note is what happens to the other half of the VSR when you execute a
|
|
|
|
|
scalar floating-point instruction (<emphasis>The contents of doubleword 1 of a VSR …
|
|
|
|
|
are undefined.</emphasis>)</para></note>
|
|
|
|
|
|
|
|
|
|
<para>The compiler will hide some of this detail when generating code for
|
|
|
|
|
little endian vector element [] notation and most vector built-ins. For example
|
|
|
|
|
<literal>vec_splat (A, 0)</literal> is transformed for
|
|
|
|
|
PPC64LE to <literal>xxspltd VRT,VRA,1</literal>.
|
|
|
|
|
What the compiler <emphasis><emphasis role="bold">can not</emphasis></emphasis>
|
|
|
|
|
hide is the different placement of scalars within vector registers.</para>
|
|
|
|
|
|
|
|
|
|
<para>Vector registers (VRs) 0-31 overlay and can be accessed from vector
|
|
|
|
|
scalar registers (VSRs) 32-63. The ABI also specifies that VR2-13 are used to
|
|
|
|
|
pass parameter and return values. In some cases the same (similar) operations
|
|
|
|
|
exist in both VMX and VSX instruction forms, while in the other cases
|
|
|
|
|
operations only exist for VMX (byte level permute and shift) or VSX (Vector
|
|
|
|
|
double).</para>
|
|
|
|
|
|
|
|
|
|
<para>So register selection that avoids unnecessary vector moves and follows
|
|
|
|
|
the ABI while maintaining the correct instruction specific register numbering,
|
|
|
|
|
can be tricky. The
|
|
|
|
|
<link xlink:href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Machine-Constraints.html#Machine-Constraints">GCC register constraint</link>
|
|
|
|
|
annotations for Inline
|
|
|
|
|
assembler using vector instructions are challenging, even for experts. So only
|
|
|
|
|
experts should be writing assembler and then only in extraordinary
|
|
|
|
|
circumstances. You should leave these details to the compiler (using vector
|
|
|
|
|
extensions and vector built-ins) when ever possible.</para>
|
|
|
|
|
|
|
|
|
|
<para>The next sections gets into the details of floating point
|
|
|
|
|
representation, operations, and exceptions. They describe the implementation
|
|
|
|
|
details for the IEEE-754R and C/C++ language standards that most developers only
|
|
|
|
|
access via higher level APIs. Most programmers will not need this level of
|
|
|
|
|
detail, but it is there if needed.
|
|
|
|
|
|
|
|
|
|
<literallayout><literal>7.3 VSX Operations . . . . . . . . . . . . . . . . . . . . . . . . . 326
|
|
|
|
|
7.3.1 VSX Floating-Point Arithmetic Overview . . . . . . . . . . . . 326
|
|
|
|
|
7.3.2 VSX Floating-Point Data . . . . . . . . . . . . . . . . . . . 327
|
|
|
|
|
7.3.3 VSX Floating-Point Execution Models . . . . . . . . . . . . . 335
|
|
|
|
|
7.4 VSX Floating-Point Exceptions . . . . . . . . . . . . . . . . . 338
|
|
|
|
|
7.4.1 Floating-Point Invalid Operation Exception . . . . . . . . . . 341
|
|
|
|
|
7.4.2 Floating-Point Zero Divide Exception . . . . . . . . . . . . . 347
|
|
|
|
|
7.4.3 Floating-Point Overflow Exception. . . . . . . . . . . . . . . 349
|
|
|
|
|
7.4.4 Floating-Point Underflow Exception . . . . . . . . . . . . . . 351</literal></literallayout></para>
|
|
|
|
|
|
|
|
|
|
<para>Next comes an overview of the VSX storage access instructions for big and
|
|
|
|
|
little endian and for aligned and unaligned data addresses. This included
|
|
|
|
|
diagrams that illuminate the differences.
|
|
|
|
|
|
|
|
|
|
<literallayout><literal>7.5 VSX Storage Access Operations . . . . . . . . . . . . . . . . . 356
|
|
|
|
|
7.5.1 Accessing Aligned Storage Operands . . . . . . . . . . . . . . 356
|
|
|
|
|
7.5.2 Accessing Unaligned Storage Operands . . . . . . . . . . . . . 357
|
|
|
|
|
7.5.3 Storage Access Exceptions . . . . . . . . . . . . . . . . . . 358</literal></literallayout></para>
|
|
|
|
|
|
|
|
|
|
<para>Section 7.6 starts with a VSX instruction Set Summary which is the
|
|
|
|
|
place to start to get a feel for the types and operations supported. The
|
|
|
|
|
emphasis on floating-point, both scalar and vector (especially vector double), is
|
|
|
|
|
pronounced. Many of the scalar and single-precision vector instructions look
|
|
|
|
|
like duplicates of what we have seen in the Chapter 4 Floating-Point and
|
|
|
|
|
Chapter 6 Vector facilities. The difference here is new instruction encodings
|
|
|
|
|
to access the full 64 VSX register space. </para>
|
|
|
|
|
|
|
|
|
|
<para>In addition there are a small number of logical instructions
|
|
|
|
|
included to support predication (selecting / masking vector elements based on
|
|
|
|
|
comparison results), and a set of permute, merge, shift, and splat instructions that
|
|
|
|
|
operate on VSX word (float) and doubleword (double) elements. As mentioned
|
|
|
|
|
about VMX section 6.8 these instructions are good to study as they are useful
|
|
|
|
|
for realigning elements from PowerISA vector results to the form required for Intel
|
|
|
|
|
Intrinsics.
|
|
|
|
|
|
|
|
|
|
<literallayout><literal>7.6 VSX Instruction Set . . . . . . . . . . . . . . . . . . . . . . 359
|
|
|
|
|
7.6.1 VSX Instruction Set Summary . . . . . . . . . . . . . . . . . 359
|
|
|
|
|
7.6.1.1 VSX Storage Access Instructions . . . . . . . . . . . . . . 359
|
|
|
|
|
7.6.1.2 VSX Move Instructions . . . . . . . . . . . . . . . . . . . 360
|
|
|
|
|
7.6.1.3 VSX Floating-Point Arithmetic Instructions . . . . . . . . 360
|
|
|
|
|
7.6.1.4 VSX Floating-Point Compare Instructions . . . . . . . . . . 363
|
|
|
|
|
7.6.1.5 VSX DP-SP Conversion Instructions . . . . . . . . . . . . . 364
|
|
|
|
|
7.6.1.6 VSX Integer Conversion Instructions . . . . . . . . . . . . 364
|
|
|
|
|
7.6.1.7 VSX Round to Floating-Point Integer Instructions . . . . . 366
|
|
|
|
|
7.6.1.8 VSX Logical Instructions. . . . . . . . . . . . . . . . . . 366
|
|
|
|
|
7.6.1.9 VSX Permute Instructions. . . . . . . . . . . . . . . . . . 367
|
|
|
|
|
7.6.2 VSX Instruction Description Conventions . . . . . . . . . . . 368
|
|
|
|
|
7.6.3 VSX Instruction Descriptions . . . . . . . . . . . . . . . . 392</literal></literallayout></para>
|
|
|
|
|
|
|
|
|
|
<para>The VSX Instruction Descriptions section contains the detail
|
|
|
|
|
description for each VSX category instruction. The table entries from the
|
|
|
|
|
Instruction Set Summary are formatted in the document as hyperlinks to
|
|
|
|
|
corresponding instruction descriptions.</para>
|
|
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|