|
|
<?xml version="1.0" encoding="UTF-8"?> |
|
|
<!-- |
|
|
Copyright (c) 2017 OpenPOWER Foundation |
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); |
|
|
you may not use this file except in compliance with the License. |
|
|
You may obtain a copy of the License at |
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software |
|
|
distributed under the License is distributed on an "AS IS" BASIS, |
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
|
See the License for the specific language governing permissions and |
|
|
limitations under the License. |
|
|
|
|
|
--> |
|
|
<section xmlns="http://docbook.org/ns/docbook" |
|
|
xmlns:xi="http://www.w3.org/2001/XInclude" |
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" |
|
|
version="5.0" |
|
|
xml:id="sec_extra_attributes"> |
|
|
<title>Those extra attributes</title> |
|
|
|
|
|
<para>You may have noticed there are some special attributes: |
|
|
|
|
|
<literallayout>__gnu_inline__ |
|
|
|
|
|
This attribute should be used with a function that is also declared with the |
|
|
inline keyword. It directs GCC to treat the function as if it were defined in |
|
|
gnu90 mode even when compiling in C99 or gnu99 mode. |
|
|
|
|
|
If the function is declared extern, then this definition of the function is used |
|
|
only for inlining. In no case is the function compiled as a standalone function, |
|
|
not even if you take its address explicitly. Such an address becomes an external |
|
|
reference, as if you had only declared the function, and had not defined it. This |
|
|
has almost the effect of a macro. The way to use this is to put a function |
|
|
definition in a header file with this attribute, and put another copy of the |
|
|
function, without extern, in a library file. The definition in the header file |
|
|
causes most calls to the function to be inlined. |
|
|
|
|
|
__always_inline__ |
|
|
|
|
|
Generally, functions are not inlined unless optimization is specified. For func- |
|
|
tions declared inline, this attribute inlines the function independent of any |
|
|
restrictions that otherwise apply to inlining. Failure to inline such a function |
|
|
is diagnosed as an error. |
|
|
|
|
|
__artificial__ |
|
|
|
|
|
This attribute is useful for small inline wrappers that if possible should appear |
|
|
during debugging as a unit. Depending on the debug info format it either means |
|
|
marking the function as artificial or using the caller location for all instructions |
|
|
within the inlined body. |
|
|
|
|
|
__extension__ |
|
|
|
|
|
... -pedantic’ and other options cause warnings for many GNU C extensions. |
|
|
You can prevent such warnings within one expression by writing __extension__</literallayout></para> |
|
|
|
|
|
<para>So far I have been using these attributes unchanged.</para> |
|
|
|
|
|
<para>But most intrinsics map the Intel intrinsic to one or more target |
|
|
specific GCC builtins. For example: |
|
|
<programlisting><![CDATA[/* Load two DPFP values from P. The address must be 16-byte aligned. */ |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_load_pd (double const *__P) |
|
|
{ |
|
|
return *(__m128d *)__P; |
|
|
} |
|
|
|
|
|
/* Load two DPFP values from P. The address need not be 16-byte aligned. */ |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_loadu_pd (double const *__P) |
|
|
{ |
|
|
return __builtin_ia32_loadupd (__P); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>The first intrinsic (_mm_load_pd ) is implement as a C vector pointer |
|
|
reference, but from the comment assumes the compiler will use a |
|
|
<emphasis role="bold">movapd</emphasis> |
|
|
instruction that requires 16-byte alignment (will raise a general-protection |
|
|
exception if not aligned). This implies that there is a performance advantage |
|
|
for at least some Intel processors to keep the vector aligned. The second |
|
|
intrinsic uses the explicit GCC builtin |
|
|
<emphasis role="bold"><literal>__builtin_ia32_loadupd</literal></emphasis> to generate the |
|
|
<emphasis role="bold"><literal>movupd</literal></emphasis> instruction which handles unaligned references.</para> |
|
|
|
|
|
<para>The opposite assumption applies to POWER and PPC64LE, where GCC |
|
|
generates the VSX <emphasis role="bold"><literal>lxvd2x</literal></emphasis> / |
|
|
<emphasis role="bold"><literal>xxswapd</literal></emphasis> |
|
|
instruction sequence by default, which |
|
|
allows unaligned references. The PowerISA equivalent for aligned vector access |
|
|
is the VMX <emphasis role="bold"><literal>lvx</literal></emphasis> instruction and the |
|
|
<emphasis role="bold"><literal>vec_ld</literal></emphasis> builtin, which forces quadword |
|
|
aligned access (by ignoring the low order 4 bits of the effective address). The |
|
|
<emphasis role="bold"><literal>lvx</literal></emphasis> instruction does not raise |
|
|
alignment exceptions, but perhaps should as part |
|
|
of our implementation of the Intel intrinsic. This requires that we use |
|
|
PowerISA VMX/VSX built-ins to insure we get the expected results.</para> |
|
|
|
|
|
<para>The current prototype defines the following: |
|
|
<programlisting><![CDATA[/* Load two DPFP values from P. The address must be 16-byte aligned. */ |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_load_pd (double const *__P) |
|
|
{ |
|
|
assert(((unsigned long)__P & 0xfUL) == 0UL); |
|
|
return ((__m128d)vec_ld(0, (__v16qu*)__P)); |
|
|
} |
|
|
|
|
|
/* Load two DPFP values from P. The address need not be 16-byte aligned. */ |
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) |
|
|
_mm_loadu_pd (double const *__P) |
|
|
{ |
|
|
return (vec_vsx_ld(0, __P)); |
|
|
}]]></programlisting></para> |
|
|
|
|
|
<para>The aligned load intrinsic adds an assert which checks alignment |
|
|
(to match the Intel semantic) and uses the GCC builtin |
|
|
<emphasis role="bold"><literal>vec_ld</literal></emphasis> (generates an |
|
|
<emphasis role="bold"><literal>lvx</literal></emphasis>). The assert |
|
|
generates extra code but this can be eliminated by defining |
|
|
<emphasis role="bold"><literal>NDEBUG</literal></emphasis> at compile time. |
|
|
The unaligned load intrinsic uses the GCC builtin |
|
|
<literal>vec_vsx_ld</literal> (for PPC64LE generates |
|
|
<emphasis role="bold"><literal>lxvd2x</literal></emphasis> / |
|
|
<emphasis role="bold"><literal>xxswapd</literal></emphasis> for POWER8 and will |
|
|
simplify to <emphasis role="bold"><literal>lxv</literal></emphasis> |
|
|
or <emphasis role="bold"><literal>lxvx</literal></emphasis> |
|
|
for POWER9). And similarly for <emphasis role="bold"><literal>__mm_store_pd</literal></emphasis> / |
|
|
<emphasis role="bold"><literal>__mm_storeu_pd</literal></emphasis>, using |
|
|
<emphasis role="bold"><literal>vec_st</literal></emphasis> |
|
|
and <emphasis role="bold"><literal>vec_vsx_st</literal></emphasis>. These concepts extent to the |
|
|
load/store intrinsics for vector float and vector int.</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
|