|
|
<?xml version="1.0" encoding="UTF-8"?> |
|
|
<!-- |
|
|
Copyright (c) 2017 OpenPOWER Foundation |
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); |
|
|
you may not use this file except in compliance with the License. |
|
|
You may obtain a copy of the License at |
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software |
|
|
distributed under the License is distributed on an "AS IS" BASIS, |
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
|
See the License for the specific language governing permissions and |
|
|
limitations under the License. |
|
|
|
|
|
--> |
|
|
<section xmlns="http://docbook.org/ns/docbook" |
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" |
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" |
|
|
version="5.0" |
|
|
xml:id="sec_vec_or_not"> |
|
|
<title>To vec_not or not</title> |
|
|
|
|
|
<para>Now lets look at a similar example that adds some surprising |
|
|
complexity. When we look at the negated compare forms we can not find |
|
|
exact matches in the PowerISA. But a little knowledge of boolean |
|
|
algebra can show the way to the equivalent functions.</para> |
|
|
|
|
|
<para>First the X86 compare not equal case where we might expect to |
|
|
find the equivalent vec_cmpne builtins for PowerISA: |
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpneq_pd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
return (__m128d)__builtin_ia32_cmpneqpd ((__v2df)__A, (__v2df)__B); |
|
|
} |
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpneq_sd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
return (__m128d)__builtin_ia32_cmpneqsd ((__v2df)__A, (__v2df)__B); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>Well not exactly. Looking at the OpenPOWER ABI document we see a |
|
|
reference to |
|
|
<literal>vec_cmpne</literal> for all numeric types. But when we look in the current |
|
|
GCC 6 documentation we find that |
|
|
<literal>vec_cmpne</literal> is not on the list. So it is planned |
|
|
in the ABI, but not implemented yet.</para> |
|
|
|
|
|
<para>Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to |
|
|
Double-Precision but no Not Equal. In fact we see only vector double compare |
|
|
instructions for greater than and greater than or equal in addition to the |
|
|
equal compare. Not only can't we find a not equal, there is no less than or |
|
|
less than or equal compares either.</para> |
|
|
|
|
|
<para>So what is going on here? Partially this is the Reduced Instruction |
|
|
Set Computer (RISC) design philosophy. In this case the compiler can generate |
|
|
all the required compares using the existing vector instructions and simple |
|
|
transforms based on Boolean algebra. So |
|
|
<literal>vec_cmpne(A,B)</literal> is simply <literal>vec_not |
|
|
(vec_cmpeq(A,B))</literal>. And <literal>vec_cmplt(A,B)</literal> is simply |
|
|
<literal>vec_cmpgt(B,A)</literal> based on the |
|
|
identity A < B <emphasis><emphasis role="bold">iff</emphasis></emphasis> B > A. |
|
|
Similarly <literal>vec_cmple(A,B)</literal> is implemented as |
|
|
<literal>vec_cmpge(B,A)</literal>.</para> |
|
|
|
|
|
<para>What a minute, there is no <literal>vec_not()</literal> either. Can not find it in the |
|
|
PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation. |
|
|
There is no <literal>vec_move()</literal> either! How can this possibly work?</para> |
|
|
|
|
|
<para>This is RISC philosophy again. We can always use a logical |
|
|
instruction (like bit wise <emphasis role="bold">and</emphasis> or |
|
|
<emphasis role="bold">or</emphasis>) to effect a move, given that we also have |
|
|
nondestructive 3 register instruction forms. In the PowerISA most instruction |
|
|
have two input registers and a separate result register. So if the result |
|
|
register number is different from either input register then the inputs are |
|
|
not clobbered (nondestructive). Of course nothing prevents you from specifying |
|
|
the same register for both inputs or even all three registers (result and both |
|
|
inputs). And some times it is useful.</para> |
|
|
|
|
|
<para>The statement <literal>B = vec_or (A,A)</literal> is is effectively a vector move/copy |
|
|
from <literal>A</literal> to <literal>B</literal>. And <literal>A = vec_or (A,A)</literal> is obviously a |
|
|
<emphasis role="bold"><literal>nop</literal></emphasis> (no operation). In fact the |
|
|
PowerISA defines the preferred <literal>nop</literal> and register move for vector registers |
|
|
in this way.</para> |
|
|
|
|
|
<para>The PowerISA implements the logical operators |
|
|
<emphasis role="bold">nor</emphasis> (<emphasis role="bold">not or</emphasis>) |
|
|
and <emphasis role="bold">nand</emphasis> (<emphasis role="bold">not and</emphasis>). |
|
|
The PowerISA provides these instruction for |
|
|
fixed point and vector logical operations. So <literal>vec_not(A)</literal> |
|
|
can be implemented as <literal>vec_nor(A,A)</literal>. |
|
|
So for the implementation of _mm_cmpne we propose the following: |
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpneq_pd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
__v2df temp = (__v2df ) vec_cmpeq ((__v2df) __A, (__v2df)__B); |
|
|
return ((__m128d)vec_nor (temp, temp)); |
|
|
} |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpneq_sd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
__v2df a, b, c; |
|
|
a = vec_splat(__A, 0); |
|
|
b = vec_splat(__B, 0); |
|
|
c = (__v2df)vec_cmpeq(a, b); |
|
|
c = (__v2df)vec_nor(c, c); |
|
|
return ((__m128d){c[0], __A[1]}); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>The Intel Intrinsics also include the not forms of the relational |
|
|
compares: |
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpnlt_pd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
return (__m128d)__builtin_ia32_cmpnltpd ((__v2df)__A, (__v2df)__B); |
|
|
} |
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpnle_pd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
return (__m128d)__builtin_ia32_cmpnlepd ((__v2df)__A, (__v2df)__B); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in |
|
|
documentation do not provide any direct equivalents to the not greater than |
|
|
class of compares. Again you don't really need them if you know Boolean |
|
|
algebra. We can use identities like |
|
|
{<emphasis role="bold">not</emphasis> (A < B) iff A >= B} and |
|
|
{<emphasis role="bold">not</emphasis> (A |
|
|
<= B) iff A > B}. So the PPC64LE implementation follows: |
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpnlt_pd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
return ((__m128d)vec_cmpge (__A, __B)); |
|
|
} |
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_cmpnle_pd (__m128d __A, __m128d __B) |
|
|
{ |
|
|
return ((__m128d)vec_cmpgt (__A, __B)); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>These patterns repeat for the scalar version of the |
|
|
<emphasis role="bold">not</emphasis> compares. And |
|
|
in general the larger pattern described in this chapter applies to the other |
|
|
float and integer types with similar interfaces.</para> |
|
|
|
|
|
|
|
|
</section> |
|
|
|
|
|
|