Programming-Guides/Porting_Vector_Intrinsics/sec_other_intrinsic_example...

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Copyright (c) 2017 OpenPOWER Foundation

  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.

-->
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="sec_other_intrinsic_examples">
  <title>Examples implemented using other intrinsics</title>

  <para>Some intrinsic implementations are defined in terms of other
  intrinsics. For example.
  <programlisting><![CDATA[/* Create a vector with element [0] as F and the rest zero.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_set_sd (double __F)
{
  return __extension__ (__m128d){ __F, 0.0 };
}

/* Create a vector with element [0] as *P and the rest zero.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_load_sd (double const *__P)
{
  return _mm_set_sd (*__P);
}]]></programlisting></para>

  <para>This notion of using part (one fourth or half) of the SSE XMM
  register and leaving the rest unchanged (or forced to zero) is specific to SSE
  scalar operations and can generate some complicated (sub-optimal) PowerISA
  code.  In this case <emphasis role="bold"><literal>_mm_load_sd</literal></emphasis>
  passes the dereferenced double value  to
  <emphasis role="bold"><literal>_mm_set_sd</literal></emphasis> which
  uses C vector initializer notation to combine (merge) that
  double scalar value with a scalar 0.0 constant into a vector double.</para>

  <para>While code like this should work as-is for PPC64LE, you should look
  at the generated code and assess if it is reasonable.  In this case the code
  is not awful (a load double splat, vector xor to generate 0.0s, then a
  <literal>xxmrghd</literal>
  to combine __F and 0.0).  Other examples may generate sub-optimal code and
  justify a rewrite to PowerISA scalar or vector code (<link
  xlink:href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-
  in-Functions.html#PowerPC-AltiVec_002fVSX-Built-in-Functions">
  <emphasis role="italic">GCC PowerPC AltiVec Built-in Functions</emphasis></link>
  or inline assembler). </para>

  <note><para>Try using the existing C code if you can, but check on what the
  compiler generates.  If the generated code is horrendous, it may be worth the
  effort to write a PowerISA specific equivalent. For codes making extensive use
  of MMX or SSE scalar intrinsics you will be better off rewriting to use
  standard C scalar types and letting the GCC compiler handle the details
  (see <xref linkend="sec_prefered_methods"/>).</para></note>

</section>