|
|
<?xml version="1.0" encoding="UTF-8"?> |
|
|
<!-- |
|
|
Copyright (c) 2017 OpenPOWER Foundation |
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); |
|
|
you may not use this file except in compliance with the License. |
|
|
You may obtain a copy of the License at |
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software |
|
|
distributed under the License is distributed on an "AS IS" BASIS, |
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
|
See the License for the specific language governing permissions and |
|
|
limitations under the License. |
|
|
|
|
|
--> |
|
|
<section xmlns="http://docbook.org/ns/docbook" |
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" |
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" |
|
|
version="5.0" |
|
|
xml:id="sec_power_vector_scalar_floatingpoint"> |
|
|
<title>Vector-Scalar Floating-Point Operations (VSX)</title> |
|
|
|
|
|
<para>With PowerISA 2.06 (POWER7) we extended the vector SIMD capabilities |
|
|
of the PowerISA:</para> |
|
|
|
|
|
<itemizedlist spacing="compact"> |
|
|
<listitem> |
|
|
<para>Extend the available vector and floating-point scalar register |
|
|
sets from 32 registers each to a combined register set of 64 x 64-bit |
|
|
scalar floating-point and |
|
|
64 x 128-bit vector registers.</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Enable scalar double float operations on all 64 scalar |
|
|
registers.</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Enable vector double and vector float operations for all 64 |
|
|
vector registers.</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Enable super-scalar execution of vector instructions and support |
|
|
2 independent vector floating point pipelines for parallel execution of 4 x |
|
|
64-bit Floating point Fused Multiply Adds (FMAs) and 8 x 32-bit FMAs per |
|
|
cycle.</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<para>With PowerISA 2.07 (POWER8) we added single-precision scalar |
|
|
floating-point instructions to VSX. This completes the floating-point |
|
|
computational set for VSX. This ISA release also clarified how these operate in |
|
|
the Little Endian storage model.</para> |
|
|
|
|
|
<para>While the focus was on enhanced floating-point computation (for High |
|
|
Performance Computing), VSX also extended the ISA with additional storage |
|
|
access, logical, and permute (merge, splat, shift) instructions. This was |
|
|
necessary to extend these operations to cover 64 VSX registers, and improves |
|
|
unaligned storage access for vectors (not available in VMX).</para> |
|
|
|
|
|
<para>The PowerISA 2.07B Chapter 7. Vector-Scalar Floating-Point Operations |
|
|
is organized starting with an introduction and overview (chapters 7.1- 7.5) . |
|
|
The early sections (7.1 and 7.2) describe the layout of the 64 VSX registers |
|
|
and how they relate (overlap and inter-operate) to the existing floating point |
|
|
scalar (FPRs) and vector (VMX VRs) registers. |
|
|
|
|
|
<literallayout><literal>7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 317 |
|
|
7.1.1 Overview of the Vector-Scalar Extension . . . . . . . . . . . 317 |
|
|
7.2 VSX Registers . . . . . . . . . . . . . . . . . . . . . . . . . 318 |
|
|
7.2.1 Vector-Scalar Registers . . . . . . . . . . . . . . . . . . . 318 |
|
|
7.2.2 Floating-Point Status and Control Register . . . . . . . . . . 321</literal></literallayout></para> |
|
|
|
|
|
<para>The definitions given in “7.1.1.1 Compatibility with Category |
|
|
Floating-Point and Category Decimal Floating-Point Operations”, and |
|
|
“7.1.1.2 Compatibility with Category Vector Operations” |
|
|
<blockquote> |
|
|
<para>The instruction sets defined in Chapter 4. |
|
|
Floating-Point Facility and Chapter 5. Decimal |
|
|
Floating-Point retain their definition with one primary |
|
|
difference. The FPRs are mapped to doubleword |
|
|
element 0 of VSRs 0-31. The contents of doubleword 1 |
|
|
of the VSR corresponding to a source FPR specified |
|
|
by an instruction are ignored. The contents of |
|
|
doubleword 1 of a VSR corresponding to the target |
|
|
FPR specified by an instruction are undefined.</para> |
|
|
|
|
|
<para>The instruction set defined in Chapter 6. Vector Facility |
|
|
[Category: Vector], retains its definition with one |
|
|
primary difference. The VRs are mapped to VSRs |
|
|
32-63.</para></blockquote></para> |
|
|
|
|
|
<note><para>The reference to scalar element 0 above is from the big endian |
|
|
register perspective of the ISA. In the PPC64LE ABI implementation, and for the |
|
|
purpose of porting Intel intrinsics, this is logical doubleword element 1. Intel SSE |
|
|
scalar intrinsics operated on logical element [0], which is in the wrong |
|
|
position for PowerISA FPU and VSX scalar floating-point operations. Another |
|
|
important note is what happens to the other half of the VSR when you execute a |
|
|
scalar floating-point instruction (<emphasis>The contents of doubleword 1 of a VSR … |
|
|
are undefined.</emphasis>)</para></note> |
|
|
|
|
|
<para>The compiler will hide some of this detail when generating code for |
|
|
little endian vector element [] notation and most vector built-ins. For example |
|
|
<literal>vec_splat (A, 0)</literal> is transformed for |
|
|
PPC64LE to <literal>xxspltd VRT,VRA,1</literal>. |
|
|
What the compiler <emphasis><emphasis role="bold">can not</emphasis></emphasis> |
|
|
hide is the different placement of scalars within vector registers.</para> |
|
|
|
|
|
<para>Vector registers (VRs) 0-31 overlay and can be accessed from vector |
|
|
scalar registers (VSRs) 32-63. The ABI also specifies that VR2-13 are used to |
|
|
pass parameter and return values. In some cases the same (similar) operations |
|
|
exist in both VMX and VSX instruction forms, while in the other cases |
|
|
operations only exist for VMX (byte level permute and shift) or VSX (Vector |
|
|
double).</para> |
|
|
|
|
|
<para>So register selection that avoids unnecessary vector moves and follows |
|
|
the ABI while maintaining the correct instruction specific register numbering, |
|
|
can be tricky. The |
|
|
<link xlink:href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Machine-Constraints.html#Machine-Constraints">GCC register constraint</link> |
|
|
annotations for Inline |
|
|
assembler using vector instructions are challenging, even for experts. So only |
|
|
experts should be writing assembler and then only in extraordinary |
|
|
circumstances. You should leave these details to the compiler (using vector |
|
|
extensions and vector built-ins) when ever possible.</para> |
|
|
|
|
|
<para>The next sections gets into the details of floating point |
|
|
representation, operations, and exceptions. They describe the implementation |
|
|
details for the IEEE-754R and C/C++ language standards that most developers only |
|
|
access via higher level APIs. Most programmers will not need this level of |
|
|
detail, but it is there if needed. |
|
|
|
|
|
<literallayout><literal>7.3 VSX Operations . . . . . . . . . . . . . . . . . . . . . . . . . 326 |
|
|
7.3.1 VSX Floating-Point Arithmetic Overview . . . . . . . . . . . . 326 |
|
|
7.3.2 VSX Floating-Point Data . . . . . . . . . . . . . . . . . . . 327 |
|
|
7.3.3 VSX Floating-Point Execution Models . . . . . . . . . . . . . 335 |
|
|
7.4 VSX Floating-Point Exceptions . . . . . . . . . . . . . . . . . 338 |
|
|
7.4.1 Floating-Point Invalid Operation Exception . . . . . . . . . . 341 |
|
|
7.4.2 Floating-Point Zero Divide Exception . . . . . . . . . . . . . 347 |
|
|
7.4.3 Floating-Point Overflow Exception. . . . . . . . . . . . . . . 349 |
|
|
7.4.4 Floating-Point Underflow Exception . . . . . . . . . . . . . . 351</literal></literallayout></para> |
|
|
|
|
|
<para>Next comes an overview of the VSX storage access instructions for big and |
|
|
little endian and for aligned and unaligned data addresses. This included |
|
|
diagrams that illuminate the differences. |
|
|
|
|
|
<literallayout><literal>7.5 VSX Storage Access Operations . . . . . . . . . . . . . . . . . 356 |
|
|
7.5.1 Accessing Aligned Storage Operands . . . . . . . . . . . . . . 356 |
|
|
7.5.2 Accessing Unaligned Storage Operands . . . . . . . . . . . . . 357 |
|
|
7.5.3 Storage Access Exceptions . . . . . . . . . . . . . . . . . . 358</literal></literallayout></para> |
|
|
|
|
|
<para>Section 7.6 starts with a VSX instruction Set Summary which is the |
|
|
place to start to get a feel for the types and operations supported. The |
|
|
emphasis on floating-point, both scalar and vector (especially vector double), is |
|
|
pronounced. Many of the scalar and single-precision vector instructions look |
|
|
like duplicates of what we have seen in the Chapter 4 Floating-Point and |
|
|
Chapter 6 Vector facilities. The difference here is new instruction encodings |
|
|
to access the full 64 VSX register space. </para> |
|
|
|
|
|
<para>In addition there are a small number of logical instructions |
|
|
included to support predication (selecting / masking vector elements based on |
|
|
comparison results), and a set of permute, merge, shift, and splat instructions that |
|
|
operate on VSX word (float) and doubleword (double) elements. As mentioned |
|
|
about VMX section 6.8 these instructions are good to study as they are useful |
|
|
for realigning elements from PowerISA vector results to the form required for Intel |
|
|
Intrinsics. |
|
|
|
|
|
<literallayout><literal>7.6 VSX Instruction Set . . . . . . . . . . . . . . . . . . . . . . 359 |
|
|
7.6.1 VSX Instruction Set Summary . . . . . . . . . . . . . . . . . 359 |
|
|
7.6.1.1 VSX Storage Access Instructions . . . . . . . . . . . . . . 359 |
|
|
7.6.1.2 VSX Move Instructions . . . . . . . . . . . . . . . . . . . 360 |
|
|
7.6.1.3 VSX Floating-Point Arithmetic Instructions . . . . . . . . 360 |
|
|
7.6.1.4 VSX Floating-Point Compare Instructions . . . . . . . . . . 363 |
|
|
7.6.1.5 VSX DP-SP Conversion Instructions . . . . . . . . . . . . . 364 |
|
|
7.6.1.6 VSX Integer Conversion Instructions . . . . . . . . . . . . 364 |
|
|
7.6.1.7 VSX Round to Floating-Point Integer Instructions . . . . . 366 |
|
|
7.6.1.8 VSX Logical Instructions. . . . . . . . . . . . . . . . . . 366 |
|
|
7.6.1.9 VSX Permute Instructions. . . . . . . . . . . . . . . . . . 367 |
|
|
7.6.2 VSX Instruction Description Conventions . . . . . . . . . . . 368 |
|
|
7.6.3 VSX Instruction Descriptions . . . . . . . . . . . . . . . . 392</literal></literallayout></para> |
|
|
|
|
|
<para>The VSX Instruction Descriptions section contains the detail |
|
|
description for each VSX category instruction. The table entries from the |
|
|
Instruction Set Summary are formatted in the document as hyperlinks to |
|
|
corresponding instruction descriptions.</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
|