Programming-Guides/Porting_Vector_Intrinsics/sec_vec_or_not.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Copyright (c) 2017 OpenPOWER Foundation
  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
  
-->
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="sec_vec_or_not">
  <title>To vec_not or not</title>
  
  <para>Now lets look at a similar example that adds some surprising 
  complexity. When we look at the negated compare forms we can not find
  exact matches in the PowerISA. But a little knowledge of boolean
  algebra can show the way to the equivalent functions.</para>

  <para>First the X86 compare not equal case where we might expect to
  find the equivalent vec_cmpne builtins for PowerISA:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_pd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpneqpd ((__v2df)__A, (__v2df)__B);
}

extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_sd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpneqsd ((__v2df)__A, (__v2df)__B);
}]]></programlisting></para>

  <para>Well not exactly. Looking at the OpenPOWER ABI document we see a 
  reference to 
  <literal>vec_cmpne</literal> for all numeric types. But when we look in the current 
  GCC 6 documentation we find that 
  <literal>vec_cmpne</literal> is not on the list. So it is planned 
  in the ABI, but not implemented yet.</para>

  <para>Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to 
  Double-Precision but no Not Equal. In fact we see only vector double compare 
  instructions for greater than and greater than or equal in addition to the 
  equal compare. Not only can't we find a not equal, there is no less than or 
  less than or equal compares either.</para>

  <para>So what is going on here? Partially this is the Reduced Instruction 
  Set Computer (RISC) design philosophy. In this case the compiler can generate 
  all the required compares using the existing vector instructions and simple 
  transforms based on Boolean algebra. So 
  <literal>vec_cmpne(A,B)</literal> is simply <literal>vec_not 
  (vec_cmpeq(A,B))</literal>. And <literal>vec_cmplt(A,B)</literal> is simply 
  <literal>vec_cmpgt(B,A)</literal> based on the 
  identity A &lt; B <emphasis><emphasis role="bold">iff</emphasis></emphasis> B &gt; A. 
  Similarly <literal>vec_cmple(A,B)</literal> is implemented as 
  <literal>vec_cmpge(B,A)</literal>.</para>

  <para>What a minute, there is no <literal>vec_not()</literal> either. Can not find it in the 
  PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation. 
  There is no <literal>vec_move()</literal> either! How can this possibly work?</para>

  <para>This is RISC philosophy again. We can always use a logical 
  instruction (like bit wise <emphasis role="bold">and</emphasis> or 
  <emphasis role="bold">or</emphasis>) to effect a move, given that we also have 
  nondestructive 3 register instruction forms. In the PowerISA most instruction 
  have two input registers and a separate result register. So if the result 
  register number is  different from either input register then the inputs are 
  not clobbered (nondestructive). Of course nothing prevents you from specifying 
  the same register for both inputs or even all three registers (result and both 
  inputs).  And some times it is useful.</para>

  <para>The statement <literal>B = vec_or (A,A)</literal> is is effectively a vector move/copy 
  from <literal>A</literal> to <literal>B</literal>. And <literal>A = vec_or (A,A)</literal> is obviously a 
  <emphasis role="bold"><literal>nop</literal></emphasis> (no operation). In fact the 
  PowerISA defines the preferred <literal>nop</literal> and register move for vector registers
  in this way.</para>

  <para>The PowerISA implements the logical operators 
  <emphasis role="bold">nor</emphasis> (<emphasis role="bold">not or</emphasis>) 
  and <emphasis role="bold">nand</emphasis> (<emphasis role="bold">not and</emphasis>).  
  The PowerISA provides these instruction for 
  fixed point and vector logical operations. So <literal>vec_not(A)</literal> 
  can be implemented as <literal>vec_nor(A,A)</literal>. 
  So for the implementation of _mm_cmpne we propose the following:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_pd (__m128d __A, __m128d __B)
{
  __v2df temp = (__v2df ) vec_cmpeq ((__v2df) __A, (__v2df)__B);
  return ((__m128d)vec_nor (temp, temp));
}
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_sd (__m128d __A, __m128d __B)
{
	__v2df a, b, c;
	a = vec_splat(__A, 0);
	b = vec_splat(__B, 0);
	c = (__v2df)vec_cmpeq(a, b);
	c = (__v2df)vec_nor(c, c);
	return ((__m128d){c[0], __A[1]});
}]]></programlisting></para>

  <para>The Intel Intrinsics also include the not forms of the relational 
  compares:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpnltpd ((__v2df)__A, (__v2df)__B);
}

extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnle_pd (__m128d __A, __m128d __B)
{
  return (__m128d)__builtin_ia32_cmpnlepd ((__v2df)__A, (__v2df)__B);
}]]></programlisting></para>

  <para>The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in 
  documentation do not provide any direct equivalents to the  not greater than 
  class of compares. Again you don't really need them if you know Boolean 
  algebra. We can use identities like 
  {<emphasis role="bold">not</emphasis> (A &lt; B) iff A &gt;= B} and 
  {<emphasis role="bold">not</emphasis> (A 
  &lt;= B) iff A &gt; B}. So the PPC64LE implementation follows:
  <programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
{
  return ((__m128d)vec_cmpge (__A, __B));
}

extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnle_pd (__m128d __A, __m128d __B)
{
  return ((__m128d)vec_cmpgt (__A, __B));
}]]></programlisting></para>

  <para>These patterns repeat for the scalar version of the 
  <emphasis role="bold">not</emphasis> compares. And 
  in general the larger pattern described in this chapter applies to the other 
  float and integer types with similar interfaces.</para>


</section>