Examples implemented using other intrinsicsSome intrinsic implementations are defined in terms of other
intrinsics. For example.
This notion of using part (one fourth or half) of the SSE XMM
register and leaving the rest unchanged (or forced to zero) is specific to SSE
scalar operations and can generate some complicated (sub-optimal) PowerISA
code. In this case _mm_load_sd
passes the dereferenced double value to
_mm_set_sd which
uses C vector initializer notation to combine (merge) that
double scalar value with a scalar 0.0 constant into a vector double.While code like this should work as-is for PPC64LE, you should look
at the generated code and assess if it is reasonable. In this case the code
is not awful (a load double splat, vector xor to generate 0.0s, then a
xxmrghd
to combine __F and 0.0). Other examples may generate sub-optimal code and
justify a rewrite to PowerISA scalar or vector code (
GCC PowerPC AltiVec Built-in Functions
or inline assembler). Try using the existing C code if you can, but check on what the
compiler generates. If the generated code is horrendous, it may be worth the
effort to write a PowerISA specific equivalent. For codes making extensive use
of MMX or SSE scalar intrinsics you will be better off rewriting to use
standard C scalar types and letting the GCC compiler handle the details
(see ).