The types used for intrinsics
The type system for Intel intrinsics is a little strange. For example
from xmmintrin.h:
So there is one set of types that are used in the function prototypes
of the API, and the internal types that are used in the implementation. Notice
the special attribute __may_alias__. From the GCC documentation:
Accesses through pointers to types with this attribute are not subject
to type-based alias analysis, but are instead assumed to be able to alias any
other type of objects. ... This extension exists to support some vector APIs,
in which pointers to one vector type are permitted to alias pointers to a
different vector type.
So there are a
couple of issues here: 1) the API seem to force the compiler to assume
aliasing of any parameter passed by reference. Normally the compiler assumes
that parameters of different size do not overlap in storage, which allows more
optimization. 2) the data type used at the interface may not be the correct
type for the implied operation. So parameters of type
__m128i (which is defined
as vector long long) is also used for parameters and return values of vector
[char | short | int ].
This may not matter when using x86 built-in's but does matter when
the implementation uses C vector extensions or in our case use PowerPC generic
vector built-ins
().
For the later cases the type must be correct for
the compiler to generate the correct type (char, short, int, long)
() for the generic
builtin operation. There is also concern that excessive use of
__may_alias__
will limit compiler optimization. We are not sure how important this attribute
is to the correct operation of the API. So at a later stage we should
experiment with removing it from our implementation for PowerPC
The good news is that PowerISA has good support for 128-bit vectors
and (with the addition of VSX) all the required vector data (char, short, int,
long, float, double) types. However Intel supports a wider variety of the
vector sizes than PowerISA does. This started with the 64-bit MMX vector
support that preceded SSE and extends to 256-bit and 512-bit vectors of AVX,
AVX2, and AVX512 that followed SSE.
Within the GCC Intel intrinsic implementation these are all
implemented as vector attribute extensions of the appropriate size (
__vector_size__ ({8 | 16 | 32, and 64}). For the PowerPC target GCC currently
only supports the native __vector_size__ ( 16 ). These we can support directly
in VMX/VSX registers and associated instructions. The GCC will compile with
other __vector_size__ values, but the resulting types are treated as simple
arrays of the element type. This does not allow the compiler to use the vector
registers and vector instructions for these (nonnative) vectors. So what is
a programmer to do?