Those extra attributes You may have noticed there are some special attributes: __gnu_inline__ This attribute should be used with a function that is also declared with the inline keyword. It directs GCC to treat the function as if it were defined in gnu90 mode even when compiling in C99 or gnu99 mode. If the function is declared extern, then this definition of the function is used only for inlining. In no case is the function compiled as a standalone function, not even if you take its address explicitly. Such an address becomes an external reference, as if you had only declared the function, and had not defined it. This has almost the effect of a macro. The way to use this is to put a function definition in a header file with this attribute, and put another copy of the function, without extern, in a library file. The definition in the header file causes most calls to the function to be inlined. __always_inline__ Generally, functions are not inlined unless optimization is specified. For func- tions declared inline, this attribute inlines the function independent of any restrictions that otherwise apply to inlining. Failure to inline such a function is diagnosed as an error. __artificial__ This attribute is useful for small inline wrappers that if possible should appear during debugging as a unit. Depending on the debug info format it either means marking the function as artificial or using the caller location for all instructions within the inlined body. __extension__ ... -pedantic’ and other options cause warnings for many GNU C extensions. You can prevent such warnings within one expression by writing __extension__ So far I have been using these attributes unchanged. But most intrinsics map the Intel intrinsic to one or more target specific GCC builtins. For example: The first intrinsic (_mm_load_pd ) is implement as a C vector pointer reference, but from the comment assumes the compiler will use a movapd instruction that requires 16-byte alignment (will raise a general-protection exception if not aligned). This  implies that there is a performance advantage for at least some Intel processors to keep the vector aligned. The second intrinsic uses the explicit GCC builtin __builtin_ia32_loadupd to generate the movupd instruction which handles unaligned references. The opposite assumption applies to POWER and PPC64LE, where GCC generates the VSX lxvd2x / xxswapd instruction sequence by default, which allows unaligned references. The PowerISA equivalent for aligned vector access is the VMX lvx instruction and the vec_ld builtin, which forces quadword aligned access (by ignoring the low order 4 bits of the effective address). The lvx instruction does not raise alignment exceptions, but perhaps should as part of our implementation of the Intel intrinsic. This requires that we use PowerISA VMX/VSX built-ins to insure we get the expected results. The current prototype defines the following: The aligned  load intrinsic adds an assert which checks alignment (to match the Intel semantic) and uses  the GCC builtin vec_ld (generates an lvx).  The assert generates extra code but this can be eliminated by defining NDEBUG at compile time. The unaligned load intrinsic uses the GCC builtin vec_vsx_ld  (for PPC64LE generates lxvd2x / xxswapd for POWER8  and will simplify to lxv or lxvx for POWER9).  And similarly for __mm_store_pd / __mm_storeu_pd, using vec_st and vec_vsx_st. These concepts extent to the load/store intrinsics for vector float and vector int.