Programming-Guides/Porting_Vector_Intrinsics/sec_vec_or_not.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Copyright (c) 2017 OpenPOWER Foundation

  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.

-->
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="sec_vec_or_not">
  <title>To vec_not or not</title>

  <para>Now lets look at a similar example that adds some surprising
  complexity. When we look at the negated compare forms we can not find
  exact matches in the PowerISA. But a little knowledge of boolean
  algebra can show the way to the equivalent functions.</para>

  <para>First the X86 compare not equal case where we might expect to
  find the equivalent vec_cmpne builtins for PowerISA:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_pd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpneqpd ((__v2df)__A, (__v2df)__B);
}

extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_sd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpneqsd ((__v2df)__A, (__v2df)__B);
}]]></programlisting></para>

  <para>Well not exactly. Looking at the OpenPOWER ABI document we see a
  reference to
  <literal>vec_cmpne</literal> for all numeric types. But when we look in the current
  GCC 6 documentation we find that
  <literal>vec_cmpne</literal> is not on the list. So it is planned
  in the ABI, but not implemented yet.</para>

  <para>Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to
  Double-Precision but no Not Equal. In fact we see only vector double compare
  instructions for greater than and greater than or equal in addition to the
  equal compare. Not only can't we find a not equal, there is no less than or
  less than or equal compares either.</para>

  <para>So what is going on here? Partially this is the Reduced Instruction
  Set Computer (RISC) design philosophy. In this case the compiler can generate
  all the required compares using the existing vector instructions and simple
  transforms based on Boolean algebra. So
  <literal>vec_cmpne(A,B)</literal> is simply <literal>vec_not
  (vec_cmpeq(A,B))</literal>. And <literal>vec_cmplt(A,B)</literal> is simply
  <literal>vec_cmpgt(B,A)</literal> based on the
  identity A &lt; B <emphasis><emphasis role="bold">iff</emphasis></emphasis> B &gt; A.
  Similarly <literal>vec_cmple(A,B)</literal> is implemented as
  <literal>vec_cmpge(B,A)</literal>.</para>

  <para>What a minute, there is no <literal>vec_not()</literal> either. Can not find it in the
  PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation.
  There is no <literal>vec_move()</literal> either! How can this possibly work?</para>

  <para>This is RISC philosophy again. We can always use a logical
  instruction (like bit wise <emphasis role="bold">and</emphasis> or
  <emphasis role="bold">or</emphasis>) to effect a move, given that we also have
  nondestructive 3 register instruction forms. In the PowerISA most instruction
  have two input registers and a separate result register. So if the result
  register number is  different from either input register then the inputs are
  not clobbered (nondestructive). Of course nothing prevents you from specifying
  the same register for both inputs or even all three registers (result and both
  inputs).  And some times it is useful.</para>

  <para>The statement <literal>B = vec_or (A,A)</literal> is is effectively a vector move/copy
  from <literal>A</literal> to <literal>B</literal>. And <literal>A = vec_or (A,A)</literal> is obviously a
  <emphasis role="bold"><literal>nop</literal></emphasis> (no operation). In fact the
  PowerISA defines the preferred <literal>nop</literal> and register move for vector registers
  in this way.</para>

  <para>The PowerISA implements the logical operators
  <emphasis role="bold">nor</emphasis> (<emphasis role="bold">not or</emphasis>)
  and <emphasis role="bold">nand</emphasis> (<emphasis role="bold">not and</emphasis>).
  The PowerISA provides these instruction for
  fixed point and vector logical operations. So <literal>vec_not(A)</literal>
  can be implemented as <literal>vec_nor(A,A)</literal>.
  So for the implementation of _mm_cmpne we propose the following:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_pd (__m128d __A, __m128d __B)
{
  __v2df temp = (__v2df ) vec_cmpeq ((__v2df) __A, (__v2df)__B);
  return ((__m128d)vec_nor (temp, temp));
}
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_sd (__m128d __A, __m128d __B)
{
	__v2df a, b, c;
	a = vec_splat(__A, 0);
	b = vec_splat(__B, 0);
	c = (__v2df)vec_cmpeq(a, b);
	c = (__v2df)vec_nor(c, c);
	return ((__m128d){c[0], __A[1]});
}]]></programlisting></para>

  <para>The Intel Intrinsics also include the not forms of the relational
  compares:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpnltpd ((__v2df)__A, (__v2df)__B);
}

extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnle_pd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpnlepd ((__v2df)__A, (__v2df)__B);
}]]></programlisting></para>

  <para>The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in
  documentation do not provide any direct equivalents to the  not greater than
  class of compares. Again you don't really need them if you know Boolean
  algebra. We can use identities like
  {<emphasis role="bold">not</emphasis> (A &lt; B) iff A &gt;= B} and
  {<emphasis role="bold">not</emphasis> (A
  &lt;= B) iff A &gt; B}. So the PPC64LE implementation follows:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
{
  return ((__m128d)vec_cmpge (__A, __B));
}

extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnle_pd (__m128d __A, __m128d __B)
{
  return ((__m128d)vec_cmpgt (__A, __B));
}]]></programlisting></para>

  <para>These patterns repeat for the scalar version of the
  <emphasis role="bold">not</emphasis> compares. And
  in general the larger pattern described in this chapter applies to the other
  float and integer types with similar interfaces.</para>


</section>