|
|
<?xml version="1.0" encoding="UTF-8"?> |
|
|
<!-- |
|
|
Copyright (c) 2017 OpenPOWER Foundation |
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); |
|
|
you may not use this file except in compliance with the License. |
|
|
You may obtain a copy of the License at |
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software |
|
|
distributed under the License is distributed on an "AS IS" BASIS, |
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
|
See the License for the specific language governing permissions and |
|
|
limitations under the License. |
|
|
|
|
|
--> |
|
|
<section xmlns="http://docbook.org/ns/docbook" |
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" |
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" |
|
|
version="5.0" |
|
|
xml:id="sec_other_intrinsic_examples"> |
|
|
<title>Examples implemented using other intrinsics</title> |
|
|
|
|
|
<para>Some intrinsic implementations are defined in terms of other |
|
|
intrinsics. For example. |
|
|
<programlisting><![CDATA[/* Create a vector with element [0] as F and the rest zero. */ |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_set_sd (double __F) |
|
|
{ |
|
|
return __extension__ (__m128d){ __F, 0.0 }; |
|
|
} |
|
|
|
|
|
/* Create a vector with element [0] as *P and the rest zero. */ |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_load_sd (double const *__P) |
|
|
{ |
|
|
return _mm_set_sd (*__P); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>This notion of using part (one fourth or half) of the SSE XMM |
|
|
register and leaving the rest unchanged (or forced to zero) is specific to SSE |
|
|
scalar operations and can generate some complicated (sub-optimal) PowerISA |
|
|
code. In this case <emphasis role="bold"><literal>_mm_load_sd</literal></emphasis> |
|
|
passes the dereferenced double value to |
|
|
<emphasis role="bold"><literal>_mm_set_sd</literal></emphasis> which |
|
|
uses C vector initializer notation to combine (merge) that |
|
|
double scalar value with a scalar 0.0 constant into a vector double.</para> |
|
|
|
|
|
<para>While code like this should work as-is for PPC64LE, you should look |
|
|
at the generated code and assess if it is reasonable. In this case the code |
|
|
is not awful (a load double splat, vector xor to generate 0.0s, then a |
|
|
<literal>xxmrghd</literal> |
|
|
to combine __F and 0.0). Other examples may generate sub-optimal code and |
|
|
justify a rewrite to PowerISA scalar or vector code (<link |
|
|
xlink:href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/PowerPC-AltiVec_002fVSX-Built- |
|
|
in-Functions.html#PowerPC-AltiVec_002fVSX-Built-in-Functions"> |
|
|
<emphasis role="italic">GCC PowerPC AltiVec Built-in Functions</emphasis></link> |
|
|
or inline assembler). </para> |
|
|
|
|
|
<note><para>Try using the existing C code if you can, but check on what the |
|
|
compiler generates. If the generated code is horrendous, it may be worth the |
|
|
effort to write a PowerISA specific equivalent. For codes making extensive use |
|
|
of MMX or SSE scalar intrinsics you will be better off rewriting to use |
|
|
standard C scalar types and letting the GCC compiler handle the details |
|
|
(see <xref linkend="sec_prefered_methods"/>).</para></note> |
|
|
|
|
|
</section> |
|
|
|
|
|
|