Those extra attributes
You may have noticed there are some special attributes:
__gnu_inline__
This attribute should be used with a function that is also declared with the
inline keyword. It directs GCC to treat the function as if it were defined in
gnu90 mode even when compiling in C99 or gnu99 mode.
If the function is declared extern, then this definition of the function is used
only for inlining. In no case is the function compiled as a standalone function,
not even if you take its address explicitly. Such an address becomes an external
reference, as if you had only declared the function, and had not defined it. This
has almost the effect of a macro. The way to use this is to put a function
definition in a header file with this attribute, and put another copy of the
function, without extern, in a library file. The definition in the header file
causes most calls to the function to be inlined.
__always_inline__
Generally, functions are not inlined unless optimization is specified. For func-
tions declared inline, this attribute inlines the function independent of any
restrictions that otherwise apply to inlining. Failure to inline such a function
is diagnosed as an error.
__artificial__
This attribute is useful for small inline wrappers that if possible should appear
during debugging as a unit. Depending on the debug info format it either means
marking the function as artificial or using the caller location for all instructions
within the inlined body.
__extension__
... -pedantic’ and other options cause warnings for many GNU C extensions.
You can prevent such warnings within one expression by writing __extension__
So far I have been using these attributes unchanged.
But most intrinsics map the Intel intrinsic to one or more target
specific GCC builtins. For example:
The first intrinsic (_mm_load_pd ) is implement as a C vector pointer
reference, but from the comment assumes the compiler will use a
movapd
instruction that requires 16-byte alignment (will raise a general-protection
exception if not aligned). This implies that there is a performance advantage
for at least some Intel processors to keep the vector aligned. The second
intrinsic uses the explicit GCC builtin
__builtin_ia32_loadupd to generate the
movupd instruction which handles unaligned references.
The opposite assumption applies to POWER and PPC64LE, where GCC
generates the VSX lxvd2x /
xxswapd
instruction sequence by default, which
allows unaligned references. The PowerISA equivalent for aligned vector access
is the VMX lvx instruction and the
vec_ld builtin, which forces quadword
aligned access (by ignoring the low order 4 bits of the effective address). The
lvx instruction does not raise
alignment exceptions, but perhaps should as part
of our implementation of the Intel intrinsic. This requires that we use
PowerISA VMX/VSX built-ins to insure we get the expected results.
The current prototype defines the following:
The aligned load intrinsic adds an assert which checks alignment
(to match the Intel semantic) and uses the GCC builtin
vec_ld (generates an
lvx). The assert
generates extra code but this can be eliminated by defining
NDEBUG at compile time.
The unaligned load intrinsic uses the GCC builtin
vec_vsx_ld (for PPC64LE generates
lxvd2x /
xxswapd for POWER8 and will
simplify to lxv
or lxvx
for POWER9). And similarly for __mm_store_pd /
__mm_storeu_pd, using
vec_st
and vec_vsx_st. These concepts extent to the
load/store intrinsics for vector float and vector int.