The types used for intrinsics The type system for Intel intrinsics is a little strange. For example from xmmintrin.h: So there is one set of types that are used in the function prototypes of the API, and the internal types that are used in the implementation. Notice the special attribute __may_alias__. From the GCC documentation:
Accesses through pointers to types with this attribute are not subject to type-based alias analysis, but are instead assumed to be able to alias any other type of objects. ... This extension exists to support some vector APIs, in which pointers to one vector type are permitted to alias pointers to a different vector type.
There are a couple of issues here: The API seems to force the compiler to assume aliasing of any parameter passed by reference. The data type used at the interface may not be the correct type for the implied operation. Normally the compiler assumes that parameters of different size do not overlap in storage, which allows more optimization. However parameters for different vector element sizes [char | short | int | long] are all passed and returned as type __m128i (defined as vector long long). This may not matter when using x86 built-ins but does matter when the implementation uses C vector extensions or in our case uses PowerPC generic vector built-ins (). For the latter cases the type must be correct for the compiler to generate the correct type (char, short, int, long) () for the generic builtin operation. There is also concern that excessive use of __may_alias__ will limit compiler optimization. We are not sure how important this attribute is to the correct operation of the API.  So at a later stage we should experiment with removing it from our implementation for PowerPC. The good news is that PowerISA has good support for 128-bit vectors and (with the addition of VSX) all the required vector data (char, short, int, long, float, double) types. However Intel supports a wider variety of the vector sizes  than PowerISA does. This started with the 64-bit MMX vector support that preceded SSE and extends to 256-bit and 512-bit vectors of AVX, AVX2, and AVX512 that followed SSE. Within the GCC Intel intrinsic implementation these are all implemented as vector attribute extensions of the appropriate  size (   __vector_size__ ({8 | 16 | 32, and 64}). For the PowerPC target  GCC currently only supports the native __vector_size__ ( 16 ). These we can support directly in VMX/VSX registers and associated instructions. GCC will compile code with other   __vector_size__ values, but the resulting types are treated as simple arrays of the element type. This does not allow the compiler to use the vector registers and vector instructions for these (nonnative) vectors. So the PowerISA VMX/VSX facilities and GCC compiler support for 128-bit/16-byte vectors and associated vector built-ins are well matched to implementing equivalent X86 SSE intrinsic functions. However implementing the older MMX (64-bit) and the latest AVX (256 / 512-bit) extensions requires more thought and some ingenuity.