|
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
|
<!--
|
|
|
|
|
Copyright (c) 2017 OpenPOWER Foundation
|
|
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
|
|
-->
|
|
|
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
|
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
|
|
|
version="5.0"
|
|
|
|
|
xml:id="sec_powerisa_vector_size_type">
|
|
|
|
|
<title>How vector elements change size and type</title>
|
|
|
|
|
|
|
|
|
|
<para>Most vector built ins return the same vector type as the (first)
|
|
|
|
|
input parameters, but there are exceptions. Examples include conversions
|
|
|
|
|
between types, compares, pack, unpack, merge, and integer multiply
|
|
|
|
|
operations.</para>
|
|
|
|
|
|
|
|
|
|
<para>Converting floats to / from integer types will change the type and sometimes
|
|
|
|
|
change the element size as well (double ↔ int and float ↔ long). For
|
|
|
|
|
VMX the conversions are always the same size (float ↔ [unsigned] int). But
|
|
|
|
|
VSX allows conversion of 64-bit (long or double) to from 32-bit (float or
|
|
|
|
|
int) with the inherent size changes. The PowerISA VSX defines a 4-element
|
|
|
|
|
vector layout where little endian elements 0, 2 are used for input/output and
|
|
|
|
|
elements 1,3 are undefined. The OpenPOWER ABI Appendix A defines
|
|
|
|
|
<literal>vec_double</literal> and <literal>vec_float</literal>
|
|
|
|
|
with even/odd and high/low extensions as program aids. These are not
|
|
|
|
|
included in GCC 7 or earlier but are planned for GCC 8.</para>
|
|
|
|
|
|
|
|
|
|
<para>Compare operations produce either
|
|
|
|
|
<literal>vector bool <</literal>input element type<literal>></literal>
|
|
|
|
|
(effectively bit masks) or predicates (the condition code for all and
|
|
|
|
|
any are represented as an int truth variable). When a predicate compare (i.e.
|
|
|
|
|
<literal>vec_all_eq</literal>, <literal>vec_any_gt</literal>)
|
|
|
|
|
is used in an if statement, the condition code is
|
|
|
|
|
used directly in the conditional branch and the int truth value is not
|
|
|
|
|
generated.</para>
|
|
|
|
|
|
|
|
|
|
<para>Pack operations pack integer elements into the next smaller (half)
|
|
|
|
|
integer sized elements. Pack operations include signed and unsigned saturate
|
|
|
|
|
and unsigned modulo forms. As the packed result will be half the size (in
|
|
|
|
|
bits), pack instructions require 2 vectors (256-bits) as input and generate a
|
|
|
|
|
single 128-bit vector result.
|
|
|
|
|
<programlisting><![CDATA[vec_vpkudum ({1, 2}, {101, 102}) result={1, 2, 101, 102}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>Unpack operations expand integer elements into the next larger size
|
|
|
|
|
elements. The integers are always treated as signed values and sign-extended.
|
|
|
|
|
The processor design avoids instructions that return multiple register values.
|
|
|
|
|
So the PowerISA defines unpack-high and unpack low forms where instruction
|
|
|
|
|
takes (the high or low) half of vector elements and extends them to fill the
|
|
|
|
|
vector output. Element order is maintained and an unpack high / low sequence
|
|
|
|
|
with the same input vector has the effect of unpacking to a 256-bit result in two
|
|
|
|
|
vector registers.
|
|
|
|
|
<programlisting><![CDATA[vec_vupkhsw ({1, 2, 3, 4}) result={1, 2}
|
|
|
|
|
vec_vupkhsw ({-1, 2, -3, 4}) result={-1, 2}
|
|
|
|
|
vec_vupklsw ({1, 2, 3, 4}) result={3, 4}
|
|
|
|
|
vec_vupklsw ({-1, 2, -3, 4}) result={-3, 4}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>Merge operations resemble shuffling two (vectors) card decks
|
|
|
|
|
together, alternating (elements) cards in the result. As we are merging from
|
|
|
|
|
2 vectors (256-bits) into 1 vector (128-bits) and the elements do not change
|
|
|
|
|
size, we have merge high and merge low instruction forms for each (byte,
|
|
|
|
|
halfword and word) integer type. The merge high operations alternate elements
|
|
|
|
|
from the (vector register left) high half of the two input vectors. The merge
|
|
|
|
|
low operation alternate elements from the (vector register right) low half of
|
|
|
|
|
the two input vectors.</para>
|
|
|
|
|
|
|
|
|
|
<para>For PowerISA 2.07 we added vector merge word even / odd instructions.
|
|
|
|
|
Instead of high or low elements the shuffle is from the even or odd number
|
|
|
|
|
elements of the two input vectors. Passing the same vector to both inputs to
|
|
|
|
|
merge produces splat-like results for each doubleword half, which is handy in
|
|
|
|
|
some convert operations.
|
|
|
|
|
<programlisting><![CDATA[vec_mrghd ({1, 2}, {101, 102}) result={1, 101}
|
|
|
|
|
vec_mrgld ({1, 2}, {101, 102}) result={2, 102}
|
|
|
|
|
|
|
|
|
|
vec_vmrghw ({1, 2, 3, 4}, {101, 102, 103, 104}) result={1, 101, 2, 102}
|
|
|
|
|
vec_vmrghw ({1, 2, 3, 4}, {1, 2, 3, 4}) result={1, 1, 2, 2}
|
|
|
|
|
vec_vmrglw ({1, 2, 3, 4}, {101, 102, 103, 104}) result={3, 103, 4, 104}
|
|
|
|
|
vec_vmrglw ({1, 2, 3, 4}, {1, 2, 3, 4}) result={3, 3, 4, 4}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
vec_mergee ({1, 2, 3, 4}, {101, 102, 103, 104}) result={1, 101, 3, 103}
|
|
|
|
|
vec_mergee ({1, 2, 3, 4}, {1, 2, 3, 4}) result={1, 1, 3, 3}
|
|
|
|
|
vec_mergeo ({1, 2, 3, 4}, {101, 102, 103, 104}) result={2, 102, 4, 104}
|
|
|
|
|
vec_mergeo ({1, 2, 3, 4}, {1, 2, 3, 4}) result={2, 2, 4, 4}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>Integer multiply has the potential to generate twice as many bits in
|
|
|
|
|
the product as input. A multiply of 2 int (32-bit) values produces a long
|
|
|
|
|
(64-bits). Normal C language * operations ignore this and discard the top
|
|
|
|
|
32-bits of the result. However in some computations it useful to preserve the
|
|
|
|
|
double product precision for intermediate computation before reducing the final
|
|
|
|
|
result back to the original precision.</para>
|
|
|
|
|
|
|
|
|
|
<para>The PowerISA VMX instruction set took the later approach, i.e., keep all
|
|
|
|
|
the product bits until the programmer explicitly asks for the truncated result
|
|
|
|
|
(via the pack operation).
|
|
|
|
|
So the vector integer multiple are split into even/odd forms across signed and
|
|
|
|
|
unsigned byte, halfword and word inputs. This requires two instructions (given
|
|
|
|
|
the same inputs) to generate the full vector multiply across 2 vector
|
|
|
|
|
registers and 256-bits. Again as POWER processors are super-scalar this pair of
|
|
|
|
|
instructions should execute in parallel.</para>
|
|
|
|
|
|
|
|
|
|
<para>The set of expanded product values can either be used directly in
|
|
|
|
|
further (doubled precision) computation or merged/packed into the single single
|
|
|
|
|
vector at the smaller bit size. This is what the compiler will generate for C
|
|
|
|
|
vector extension multiply of vector integer types.</para>
|
|
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|