|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
<!--
|
|
|
Copyright (c) 2017 OpenPOWER Foundation
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
you may not use this file except in compliance with the License.
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
See the License for the specific language governing permissions and
|
|
|
limitations under the License.
|
|
|
|
|
|
-->
|
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
|
version="5.0"
|
|
|
xml:id="sec_intel_intrinsic_types">
|
|
|
<title>The types used for intrinsics</title>
|
|
|
|
|
|
<para>The type system for Intel intrinsics is a little strange. For example
|
|
|
from xmmintrin.h:
|
|
|
<programlisting><![CDATA[/* The Intel API is flexible enough that we must allow aliasing with other
|
|
|
vector types, and their scalar components. */
|
|
|
typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
|
|
|
|
|
|
/* Internal data types for implementing the intrinsics. */
|
|
|
typedef float __v4sf __attribute__ ((__vector_size__ (16)));]]></programlisting></para>
|
|
|
|
|
|
<para>So there is one set of types that are used in the function prototypes
|
|
|
of the API, and the internal types that are used in the implementation. Notice
|
|
|
the special attribute <literal>__may_alias__</literal>. From the GCC documentation:
|
|
|
|
|
|
<blockquote><para>
|
|
|
Accesses through pointers to types with this attribute are not subject
|
|
|
to type-based alias analysis, but are instead assumed to be able to alias any
|
|
|
other type of objects. ... This extension exists to support some vector APIs,
|
|
|
in which pointers to one vector type are permitted to alias pointers to a
|
|
|
different vector type.</para></blockquote></para>
|
|
|
|
|
|
<para>There are a couple of issues here:
|
|
|
<itemizedlist spacing="compact">
|
|
|
<listitem>
|
|
|
<para>The API seems to force the compiler to assume
|
|
|
aliasing of any parameter passed by reference.</para>
|
|
|
</listitem>
|
|
|
<listitem>
|
|
|
<para>The data type used at the interface may not be
|
|
|
the correct type for the implied operation.</para>
|
|
|
</listitem>
|
|
|
</itemizedlist>
|
|
|
Normally the compiler assumes that parameters of different size do
|
|
|
not overlap in storage, which allows more optimization.
|
|
|
However parameters for different vector element sizes
|
|
|
[char | short | int | long] are all passed and returned as type <literal>__m128i</literal>
|
|
|
(defined as vector long long). </para>
|
|
|
|
|
|
<para>This may not matter when using x86 built-ins but does matter when
|
|
|
the implementation uses C vector extensions or in our case uses PowerPC generic
|
|
|
vector built-ins
|
|
|
(<xref linkend="sec_powerisa_vector_intrinsics"/>).
|
|
|
For the latter cases the type must be correct for
|
|
|
the compiler to generate the correct type (char, short, int, long)
|
|
|
(<xref linkend="sec_api_implemented"/>) for the generic
|
|
|
builtin operation. There is also concern that excessive use of
|
|
|
<literal>__may_alias__</literal>
|
|
|
will limit compiler optimization. We are not sure how important this attribute
|
|
|
is to the correct operation of the API. So at a later stage we should
|
|
|
experiment with removing it from our implementation for PowerPC.</para>
|
|
|
|
|
|
<para>The good news is that PowerISA has good support for 128-bit vectors
|
|
|
and (with the addition of VSX) all the required vector data (char, short, int,
|
|
|
long, float, double) types. However Intel supports a wider variety of the
|
|
|
vector sizes than PowerISA does. This started with the 64-bit MMX vector
|
|
|
support that preceded SSE and extends to 256-bit and 512-bit vectors of AVX,
|
|
|
AVX2, and AVX512 that followed SSE.</para>
|
|
|
|
|
|
<para>Within the GCC Intel intrinsic implementation these are all
|
|
|
implemented as vector attribute extensions of the appropriate size (
|
|
|
<literal>__vector_size__</literal> ({8 | 16 | 32, and 64}). For the PowerPC target GCC currently
|
|
|
only supports the native <literal>__vector_size__</literal> ( 16 ). These we can support directly
|
|
|
in VMX/VSX registers and associated instructions. GCC will compile code with
|
|
|
other <literal>__vector_size__</literal> values, but the resulting types are treated as simple
|
|
|
arrays of the element type. This does not allow the compiler to use the vector
|
|
|
registers and vector instructions for these (nonnative) vectors.</para>
|
|
|
|
|
|
<para>So the PowerISA VMX/VSX facilities and GCC compiler support for
|
|
|
128-bit/16-byte vectors and associated vector built-ins
|
|
|
are well matched to implementing equivalent X86 SSE intrinsic functions.
|
|
|
However implementing the older MMX (64-bit) and the latest
|
|
|
AVX (256 / 512-bit) extensions requires more thought and some ingenuity.</para>
|
|
|
|
|
|
<xi:include href="sec_handling_mmx.xml"/>
|
|
|
<xi:include href="sec_handling_avx.xml"/>
|
|
|
|
|
|
</section>
|
|
|
|