Intel Intrinsic functions

Intel Intrinsic functions So what is an intrinsic function? From Wikipedia:

In compiler theory, an intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function though, the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation. This is also called builtin function in many languages.

The “Intel Intrinsics” API provides access to the many instruction set extensions (Intel Technologies) that Intel has added (and continues to add) over the years. The intrinsics provided access to new instruction capabilities before the compilers could exploit them directly. Initially these intrinsic functions where defined for the Intel and Microsoft compiler and where eventually implemented and contributed to GCC. The Intel Intrinsics have a specific type and naming structure. In this naming structure, functions starts with a common prefix (MMX and SSE use '_mm' prefix, while AVX added the '_mm256' '_mm512' prefixes), then a short functional name ('set', 'load', 'store', 'add', 'mul', 'blend', 'shuffle', '…') and a suffix ('_pd', '_sd', '_pi32'...) with type and packing information. See for the list of common intrisic suffixes. Oddly many of the MMX/SSE operations are not vectors at all. There are a lot of scalar operations on a single float, double, or long long type. In effect these are scalars that can take advantage of the larger (xmm) register space. Also in the Intel 32-bit architecture they provided IEEE754 float and double types, and 64-bit integers that did not exist or were hard to implement in the base i386/387 instruction set. These scalar operations use a suffix starting with '_s' (_sd for scalar double float, _ss scalar float, and _si64 for scalar long long). True vector operations use the packed or extended packed suffixes, starting with '_p' or '_ep' (_pd for vector double, _ps for vector float, and _epi32 for vector int). The use of '_ep' seems to be reserved to disambiguate intrinsics that existed in the (64-bit vector) MMX extension from the extended (128-bit vector) SSE equivalent. For example _mm_add_pi32 is a MMX operation on a pair of 32-bit integers, while _mm_add_epi32 is an SSE2 operation on vector of 4 32-bit integers. The GCC builtins for the i386.target (includes x86 and x86_64) are not the same as the Intel Intrinsics. While they have similar intent and cover most of the same functions, they use a different naming (prefixed with __builtin_ia32_, then function name with type suffix) and uses GCC vector type modes for operand types. For example: A key difference between GCC built-ins for i386 and PowerPC is that the x86 built-ins have different names of each operation and type while the PowerPC altivec built-ins tend to have a single overloaded built-in for each operation, across a set of compatible operand types. In GCC the Intel Intrinsic header (*intrin.h) files are implemented as a set of inline functions using the Intel Intrinsic API names and types. These functions are implemented as either GCC C vector extension code or via one or more GCC builtins for the i386 target. So lets take a look at some examples from GCC's SSE2 intrinsic header emmintrin.h: Note that the _mm_add_pd is implemented direct as GCC C vector extension code., while _mm_add_sd is implemented via the GCC builtin __builtin_ia32_addsd. From the discussion above we know the _pd suffix indicates a packed vector double while the _sd suffix indicates a scalar double in a XMM register.