You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

69 lines
3.1 KiB

<?xml version="1.0" encoding="UTF-8"?>
Copyright (c) 2017 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.
<section xmlns=""
<title>Examples implemented using other intrinsics</title>
<para>Some intrinsic implementations are defined in terms of other
intrinsics. For example.
<programlisting><![CDATA[/* Create a vector with element [0] as F and the rest zero. */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_set_sd (double __F)
return __extension__ (__m128d){ __F, 0.0 };
/* Create a vector with element [0] as *P and the rest zero. */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_load_sd (double const *__P)
return _mm_set_sd (*__P);
<para>This notion of using part (one fourth or half) of the SSE XMM
register and leaving the rest unchanged (or forced to zero) is specific to SSE
scalar operations and can generate some complicated (sub-optimal) PowerISA
code.  In this case <emphasis role="bold"><literal>_mm_load_sd</literal></emphasis>
passes the dereferenced double value  to
<emphasis role="bold"><literal>_mm_set_sd</literal></emphasis> which
uses C vector initializer notation to combine (merge) that
double scalar value with a scalar 0.0 constant into a vector double.</para>
<para>While code like this should work as-is for PPC64LE, you should look
at the generated code and assess if it is reasonable.  In this case the code
is not awful (a load double splat, vector xor to generate 0.0s, then a
to combine __F and 0.0).  Other examples may generate sub-optimal code and
justify a rewrite to PowerISA scalar or vector code (<link
<emphasis role="italic">GCC PowerPC AltiVec Built-in Functions</emphasis></link>
or inline assembler). </para>
<note><para>Try using the existing C code if you can, but check on what the
compiler generates.  If the generated code is horrendous, it may be worth the
effort to write a PowerISA specific equivalent. For codes making extensive use
of MMX or SSE scalar intrinsics you will be better off rewriting to use
standard C scalar types and letting the GCC compiler handle the details
(see <xref linkend="sec_prefered_methods"/>).</para></note>