Programming-Guides/Vector_Intrinsics/sec_extra_attributes.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Copyright (c) 2017 OpenPOWER Foundation
  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
  
-->
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="sec_extra_attributes">
  <title>Those extra attributes</title>
  
  <para>You may have noticed there are some special attributes:
  
  <literallayout>__gnu_inline__

This attribute should be used with a function that is also declared with the
inline keyword. It directs GCC to treat the function as if it were defined in
gnu90 mode even when compiling in C99 or gnu99 mode.

If the function is declared extern, then this definition of the function is used
only for inlining. In no case is the function compiled as a standalone function,
not even if you take its address explicitly. Such an address becomes an external
reference, as if you had only declared the function, and had not defined it. This
has almost the effect of a macro. The way to use this is to put a function
definition in a header file with this attribute, and put another copy of the
function, without extern, in a library file. The definition in the header file
causes most calls to the function to be inlined.

__always_inline__

Generally, functions are not inlined unless optimization is specified. For func-
tions declared inline, this attribute inlines the function independent of any
restrictions that otherwise apply to inlining. Failure to inline such a function
is diagnosed as an error.

__artificial__

This attribute is useful for small inline wrappers that if possible should appear
during debugging as a unit. Depending on the debug info format it either means
marking the function as artificial or using the caller location for all instructions
within the inlined body.

__extension__

... -pedantic’ and other options cause warnings for many GNU C extensions.
You can prevent such warnings within one expression by writing __extension__</literallayout></para>

  <para>So far I have been using these attributes unchanged.</para>

  <para>But most intrinsics map the Intel intrinsic to one or more target 
  specific GCC builtins. For example:
  <programlisting><![CDATA[/* Load two DPFP values from P.  The address must be 16-byte aligned.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_load_pd (double const *__P)
{
  return *(__m128d *)__P;
}

/* Load two DPFP values from P.  The address need not be 16-byte aligned.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_loadu_pd (double const *__P)
{
  return __builtin_ia32_loadupd (__P);
}]]></programlisting></para>

  <para>The first intrinsic (_mm_load_pd ) is implement as a C vector pointer 
  reference, but from the comment assumes the compiler will use a 
  <emphasis role="bold">movapd</emphasis>
  instruction that requires 16-byte alignment (will raise a general-protection 
  exception if not aligned). This  implies that there is a performance advantage 
  for at least some Intel processors to keep the vector aligned. The second 
  intrinsic uses the explicit GCC builtin 
  <emphasis role="bold"><literal>__builtin_ia32_loadupd</literal></emphasis> to generate the 
  <emphasis role="bold"><literal>movupd</literal></emphasis> instruction which handles unaligned references.</para>

  <para>The opposite assumption applies to POWER and PPC64LE, where GCC 
  generates the VSX <emphasis role="bold"><literal>lxvd2x</literal></emphasis> / 
  <emphasis role="bold"><literal>xxswapd</literal></emphasis> 
  instruction sequence by default, which 
  allows unaligned references. The PowerISA equivalent for aligned vector access 
  is the VMX <emphasis role="bold"><literal>lvx</literal></emphasis> instruction and the 
  <emphasis role="bold"><literal>vec_ld</literal></emphasis> builtin, which forces quadword 
  aligned access (by ignoring the low order 4 bits of the effective address). The 
  <emphasis role="bold"><literal>lvx</literal></emphasis> instruction does not raise 
  alignment exceptions, but perhaps should as part 
  of our implementation of the Intel intrinsic. This requires that we use 
  PowerISA VMX/VSX built-ins to insure we get the expected results.</para>

  <para>The current prototype defines the following:
  <programlisting><![CDATA[/* Load two DPFP values from P.  The address must be 16-byte aligned.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_load_pd (double const *__P)
{
  assert(((unsigned long)__P & 0xfUL) == 0UL);
  return ((__m128d)vec_ld(0, (__v16qu*)__P));
}

/* Load two DPFP values from P.  The address need not be 16-byte aligned.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_loadu_pd (double const *__P)
{
  return (vec_vsx_ld(0, __P));
}]]></programlisting></para>

  <para>The aligned  load intrinsic adds an assert which checks alignment 
  (to match the Intel semantic) and uses  the GCC builtin 
  <emphasis role="bold"><literal>vec_ld</literal></emphasis> (generates an 
  <emphasis role="bold"><literal>lvx</literal></emphasis>).  The assert 
  generates extra code but this can be eliminated by defining 
  <emphasis role="bold"><literal>NDEBUG</literal></emphasis> at compile time. 
  The unaligned load intrinsic uses the GCC builtin 
  <literal>vec_vsx_ld</literal>  (for PPC64LE generates 
  <emphasis role="bold"><literal>lxvd2x</literal></emphasis> / 
  <emphasis role="bold"><literal>xxswapd</literal></emphasis> for POWER8  and will 
  simplify to <emphasis role="bold"><literal>lxv</literal></emphasis> 
  or <emphasis role="bold"><literal>lxvx</literal></emphasis> 
  for POWER9).  And similarly for <emphasis role="bold"><literal>__mm_store_pd</literal></emphasis> / 
  <emphasis role="bold"><literal>__mm_storeu_pd</literal></emphasis>, using 
  <emphasis role="bold"><literal>vec_st</literal></emphasis>
  and <emphasis role="bold"><literal>vec_vsx_st</literal></emphasis>. These concepts extent to the 
  load/store intrinsics for vector float and vector int.</para>

</section>