Programming-Guides/Intrinsics_Reference/ch_mma_reference.xml

<!--
  Copyright (c) 2021 OpenPOWER Foundation

  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.

-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
 xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.mma-ref"
 revisionflag="added">

  <!-- Chapter Title goes here. -->
  <title>Matrix-Multiply Assist (MMA) Intrinsic Reference</title>

  <section>
    <title>Introduction</title>
    <para>
      Version 3.1 of the Power Instruction Set Architecture
      Specification (see <xref linkend="VIPR.intro.links" />)
      introduced instructions to accelerate matrix multiplication
      computations.  These instructions operate both on the VSRs and
      on eight new 512-bit accumulator registers (ACCs).  Intrinsic
      functions to access these instructions are described in this
      chapter.
    </para>
    <para>
      Although the ACCs are treated as separate registers from the
      VSRs, each <code>ACC[i]</code> may use its associated VSRs
      <code>4i</code> to <code>4i+3</code> as scratch space.  That is,
      when <code>ACC[i]</code> contains defined data, the contents of
      VSRs <code>4i</code> to <code>4i+3</code> are undefined until
      an <code>xxmfacc</code> instruction is used to copy the contents
      of <code>ACC[i]</code> to the VSRs.  Writing to a VSR associated
      with <code>ACC[i]</code> that contains defined data will cause
      <code>ACC[i]</code> to become undefined.
    </para>
    <para>
      This reference is not intended to be a complete introduction to
      MMA concepts.  The reader is directed to the Matrix-Multiply
      Assist Best Practices Guide (see <xref
      linkend="VIPR.intro.links" />) and to the POWER ISA.
    </para>
  </section>

  <section>
    <title>Type Support</title>
    <para>
      Many of the MMA instructions operate on aligned pairs of vectors
      (that is, an even numbered vector and the next-higher numbered
      vector), or on aligned quads of vectors (that is, a vector
      number divisible by four and the three next-higher numbered
      vectors).  Compilers that support the MMA intrinsic functions
      must define two types, <code>__vector_pair</code> and
      <code>__vector_quad</code>, to represent these concepts.
      Pointers and references to these types must also be supported
      where these concepts exist in the source language.
    </para>
  </section>

  <section>
    <title>Intrinsic Functions</title>
    <para>
      The intrinsics in this section are not overloaded.  Each is
      presented with its prototype and the instruction it represents.
      The string "vuc" is used as shorthand for "vector unsigned
      char" throughout.
    </para>
    <section>
      <title>Memory Access</title>
      <para>
	Load and store vector pairs.
      </para>
      <indexterm>
	<primary>lxvp</primary>
	<secondary>__builtin_vsx_lxvp</secondary>
      </indexterm>
      <indexterm>
	<primary>stxvp</primary>
	<secondary>__builtin_vsx_stxvp</secondary>
      </indexterm>
      <indexterm>
	<primary>lxvpx</primary>
	<secondary>__builtin_vsx_lxvp</secondary>
      </indexterm>
      <indexterm>
	<primary>stxvpx</primary>
	<secondary>__builtin_vsx_stxvp</secondary>
      </indexterm>
      <para>
	<informaltable frame="all">
	  <tgroup cols="2">
	    <colspec colname="c0" colwidth="40*" />
	    <colspec colname="c1" colwidth="10*" />
	    <thead>
	      <row>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Prototype</emphasis></para>
		</entry>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Instruction</emphasis></para>
		</entry>
	      </row>
	    </thead>
	    <tbody>
	      <row>
		<entry>
		  <programlisting>

  __vector_pair __builtin_vsx_lxvp (signed long a, const __vector_pair* b)

		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
  lxvp r,a(b)
      or
  lxvpx r,b,a
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>

  void __builtin_vsx_stxvp (__vector_pair s, signed long a, const __vector_pair* b)

		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
  stxvp s,a(b)
      or
  stxvpx s,b,a
		  </programlisting>
		</entry>
	      </row>
	    </tbody>
	  </tgroup>
	</informaltable>
      </para>
    </section>
    <section>
      <title>Assembly and Disassembly of Large Types</title>
      <para>
	The following intrinsics are used to construct
	<code>__vector_pair</code> and <code>__vector_quad</code>
	objects from 128-bit vectors, and deconstruct them into such
	vectors.  The disassembly interfaces place the results into
	arrays of vectors using natural element order.  The build
	interfaces treat the vector input arguments as if they form an
	array of vectors, with the first vector argument being array
	element 0 in natural element order, the second vector argument
	being array element 1, and so forth.  The assemble interfaces
	are deprecated because they do not give consistent results for
	big- and little-endian targets, and users should use the build
	interfaces instead.
      </para>
      <para>
	<informaltable frame="all">
	  <tgroup cols="2">
	    <colspec colname="c0" colwidth="40*" />
	    <colspec colname="c0" colwidth="10*" />
	    <thead>
	      <row>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Prototype</emphasis></para>
		</entry>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Notes</emphasis></para>
		</entry>
	      </row>
	    </thead>
	    <tbody>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_mma_assemble_acc (__vector_quad*, vuc, vuc, vuc, vuc)
		  </programlisting>
		</entry>
		<entry align="center" valign="middle">
		  <para>Deprecated</para>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_mma_build_acc (__vector_quad*, vuc, vuc, vuc, vuc)
		  </programlisting>
		</entry>
		<entry align="center" valign="middle">
		  <para></para>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_mma_disassemble_acc (void*, __vector_quad*)
		  </programlisting>
		</entry>
		<entry align="center" valign="middle">
		  <para></para>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_vsx_assemble_pair (__vector_pair*, vuc, vuc)
		  </programlisting>
		</entry>
		<entry align="center" valign="middle">
		  <para>Deprecated</para>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_vsx_build_pair (__vector_pair*, vuc, vuc)
		  </programlisting>
		</entry>
		<entry align="center" valign="middle">
		  <para></para>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_vsx_disassemble_pair (void*, __vector_pair*)
		  </programlisting>
		</entry>
		<entry align="center" valign="middle">
		  <para></para>
		</entry>
	      </row>
	    </tbody>
	  </tgroup>
	</informaltable>
      </para>
    </section>
    <section>
      <title>Accumulator Clear Operation</title>
      <para>
	This intrinsic function initializes an accumulator to zeros.
      </para>
      <indexterm>
	<primary>xxsetaccz</primary>
	<secondary>__builtin_mma_xxsetaccz</secondary>
      </indexterm>
      <para>
	<informaltable frame="all">
	  <tgroup cols="2">
	    <colspec colname="c0" colwidth="28*" />
	    <colspec colname="c0" colwidth="10*" />
	    <thead>
	      <row>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Prototype</emphasis></para>
		</entry>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Instruction</emphasis></para>
		</entry>
	      </row>
	    </thead>
	    <tbody>
	      <row>
		<entry>
		  <programlisting>
  void __builtin_mma_xxsetaccz (__vector_quad* a)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
  xxsetaccz a
		  </programlisting>
		</entry>
	      </row>
	    </tbody>
	  </tgroup>
	</informaltable>
      </para>
    </section>
    <section>
      <title>Conversion Operations</title>
      <para>
	These instructions convert between vectors of single precision
	and bfloat16 types.
      </para>
      <indexterm>
	<primary>xvcvbf16spn</primary>
	<secondary>__builtin_vsx_xvcvbf16spn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvcvspbf16</primary>
	<secondary>__builtin_vsx_xvcvspbf16</secondary>
      </indexterm>
      <para>
	<informaltable frame="all">
	  <tgroup cols="2">
	    <colspec colname="c0" colwidth="28*" />
	    <colspec colname="c0" colwidth="10*" />
	    <thead>
	      <row>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Prototype</emphasis></para>
		</entry>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Instruction</emphasis></para>
		</entry>
	      </row>
	    </thead>
	    <tbody>
	      <row>
		<entry>
		  <programlisting>
  vuc __builtin_vsx_xvcvbf16spn (vuc a)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
  xvcvbf16spn a
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
  vuc __builtin_vsx_xvcvspbf16 (vuc a)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
  xvcvspbf16 a
		  </programlisting>
		</entry>
	      </row>
	    </tbody>
	  </tgroup>
	</informaltable>
      </para>
    </section>
    <section>
      <title>Outer Product Operations</title>
      <para>
	Each of these intrinsics generates an instruction to perform
	an outer product operation.
      </para>
      <indexterm>
	<primary>pmxvbf16ger2</primary>
	<secondary>__builtin_mma_pmxvbf16ger2</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvbf16ger2nn</primary>
	<secondary>__builtin_mma_pmxvbf16ger2nn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvbf16ger2np</primary>
	<secondary>__builtin_mma_pmxvbf16ger2np</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvbf16ger2pn</primary>
	<secondary>__builtin_mma_pmxvbf16ger2pn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvbf16ger2pp</primary>
	<secondary>__builtin_mma_pmxvbf16ger2pp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf16ger2</primary>
	<secondary>__builtin_mma_pmxvf16ger2</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf16ger2nn</primary>
	<secondary>__builtin_mma_pmxvf16ger2nn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf16ger2np</primary>
	<secondary>__builtin_mma_pmxvf16ger2np</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf16ger2pn</primary>
	<secondary>__builtin_mma_pmxvf16ger2pn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf16ger2pp</primary>
	<secondary>__builtin_mma_pmxvf16ger2pp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf32ger</primary>
	<secondary>__builtin_mma_pmxvf32ger</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf32gernn</primary>
	<secondary>__builtin_mma_pmxvf32gernn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf32gernp</primary>
	<secondary>__builtin_mma_pmxvf32gernp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf32gerpn</primary>
	<secondary>__builtin_mma_pmxvf32gerpn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf32gerpp</primary>
	<secondary>__builtin_mma_pmxvf32gerpp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf64ger</primary>
	<secondary>__builtin_mma_pmxvf64ger</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf64gernn</primary>
	<secondary>__builtin_mma_pmxvf64gernn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf64gernp</primary>
	<secondary>__builtin_mma_pmxvf64gernp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf64gerpn</primary>
	<secondary>__builtin_mma_pmxvf64gerpn</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvf64gerpp</primary>
	<secondary>__builtin_mma_pmxvf64gerpp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi16ger2</primary>
	<secondary>__builtin_mma_pmxvi16ger2</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi16ger2pp</primary>
	<secondary>__builtin_mma_pmxvi16ger2pp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi16ger2s</primary>
	<secondary>__builtin_mma_pmxvi16ger2s</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi16ger2spp</primary>
	<secondary>__builtin_mma_pmxvi16ger2spp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi4ger8</primary>
	<secondary>__builtin_mma_pmxvi4ger8</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi4ger8pp</primary>
	<secondary>__builtin_mma_pmxvi4ger8pp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi8ger4</primary>
	<secondary>__builtin_mma_pmxvi8ger4</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi8ger4pp</primary>
	<secondary>__builtin_mma_pmxvi8ger4pp</secondary>
      </indexterm>
      <indexterm>
	<primary>pmxvi8ger4spp</primary>
	<secondary>__builtin_mma_pmxvi8ger4spp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvbf16ger2</primary>
	<secondary>__builtin_mma_xvbf16ger2</secondary>
      </indexterm>
      <indexterm>
	<primary>xvbf16ger2nn</primary>
	<secondary>__builtin_mma_xvbf16ger2nn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvbf16ger2np</primary>
	<secondary>__builtin_mma_xvbf16ger2np</secondary>
      </indexterm>
      <indexterm>
	<primary>xvbf16ger2pn</primary>
	<secondary>__builtin_mma_xvbf16ger2pn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvbf16ger2pp</primary>
	<secondary>__builtin_mma_xvbf16ger2pp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf16ger2</primary>
	<secondary>__builtin_mma_xvf16ger2</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf16ger2nn</primary>
	<secondary>__builtin_mma_xvf16ger2nn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf16ger2np</primary>
	<secondary>__builtin_mma_xvf16ger2np</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf16ger2pn</primary>
	<secondary>__builtin_mma_xvf16ger2pn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf16ger2pp</primary>
	<secondary>__builtin_mma_xvf16ger2pp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf32ger</primary>
	<secondary>__builtin_mma_xvf32ger</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf32gernn</primary>
	<secondary>__builtin_mma_xvf32gernn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf32gernp</primary>
	<secondary>__builtin_mma_xvf32gernp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf32gerpn</primary>
	<secondary>__builtin_mma_xvf32gerpn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf32gerpp</primary>
	<secondary>__builtin_mma_xvf32gerpp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf64ger</primary>
	<secondary>__builtin_mma_xvf64ger</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf64gernn</primary>
	<secondary>__builtin_mma_xvf64gernn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf64gernp</primary>
	<secondary>__builtin_mma_xvf64gernp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf64gerpn</primary>
	<secondary>__builtin_mma_xvf64gerpn</secondary>
      </indexterm>
      <indexterm>
	<primary>xvf64gerpp</primary>
	<secondary>__builtin_mma_xvf64gerpp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi16ger2</primary>
	<secondary>__builtin_mma_xvi16ger2</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi16ger2pp</primary>
	<secondary>__builtin_mma_xvi16ger2pp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi16ger2s</primary>
	<secondary>__builtin_mma_xvi16ger2s</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi16ger2spp</primary>
	<secondary>__builtin_mma_xvi16ger2spp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi4ger8</primary>
	<secondary>__builtin_mma_xvi4ger8</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi4ger8pp</primary>
	<secondary>__builtin_mma_xvi4ger8pp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi8ger4</primary>
	<secondary>__builtin_mma_xvi8ger4</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi8ger4pp</primary>
	<secondary>__builtin_mma_xvi8ger4pp</secondary>
      </indexterm>
      <indexterm>
	<primary>xvi8ger4spp</primary>
	<secondary>__builtin_mma_xvi8ger4spp</secondary>
      </indexterm>
      <para>
	<informaltable frame="all">
	  <tgroup cols="2">
	    <colspec colname="c0" colwidth="28*" />
	    <colspec colname="c1" colwidth="10*" />
	    <thead>
	      <row>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Prototype</emphasis></para>
		</entry>
		<entry align="center" valign="middle">
		  <para><emphasis role="bold">Instruction</emphasis></para>
		</entry>
	      </row>
	    </thead>
	    <tbody>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvbf16ger2 (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvbf16ger2  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvbf16ger2nn (__vector_quad* a, vuc b, vuc c,
                                    const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvbf16ger2nn  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvbf16ger2np (__vector_quad* a, vuc b, vuc c,
                                    const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvbf16ger2np  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvbf16ger2pn (__vector_quad* a, vuc b, vuc c,
                                    const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvbf16ger2pn  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvbf16ger2pp (__vector_quad* a, vuc b, vuc c,
                                    const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvbf16ger2pp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf16ger2 (__vector_quad* a, vuc b, vuc c,
                                 const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf16ger2  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf16ger2nn (__vector_quad* a, vuc b, vuc c,
                                   const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf16ger2nn  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf16ger2np (__vector_quad* a, vuc b, vuc c,
                                   const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf16ger2np  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf16ger2pn (__vector_quad* a, vuc b, vuc c,
                                   const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf16ger2pn  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf16ger2pp (__vector_quad* a, vuc b, vuc c,
                                   const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf16ger2pp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf32ger (__vector_quad* a, vuc b, vuc c,
                                const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf32ger  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf32gernn (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf32gernn  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf32gernp (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf32gernp  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf32gerpn (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf32gerpn  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf32gerpp (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf32gerpp  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf64ger (__vector_quad* a, __vector_pair b,
                                vuc c, const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf64ger  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf64gernn (__vector_quad* a, __vector_pair b,
                                  vuc c, const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf64gernn  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf64gernp (__vector_quad* a, __vector_pair b,
                                  vuc c, const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf64gernp  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf64gerpn (__vector_quad* a, __vector_pair b,
                                  vuc c, const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf64gerpn  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvf64gerpp (__vector_quad* a, __vector_pair b,
                                  vuc c, const int d, const int e)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvf64gerpp  a,b,c,d,e

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi16ger2 (__vector_quad* a, vuc b, vuc c,
                                 const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi16ger2  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi16ger2pp (__vector_quad* a, vuc b, vuc c,
                                   const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi16ger2pp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi16ger2s (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi16ger2s  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi16ger2spp (__vector_quad* a, vuc b, vuc c,
                                    const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi16ger2spp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi4ger8 (__vector_quad* a, vuc b, vuc c,
                                const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi4ger8  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi4ger8pp (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi4ger8pp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi8ger4 (__vector_quad* a, vuc b, vuc c,
                                const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi8ger4  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi8ger4pp (__vector_quad* a, vuc b, vuc c,
                                  const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi8ger4pp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_pmxvi8ger4spp (__vector_quad* a, vuc b, vuc c,
                                   const int d, const int e, const int f)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 pmxvi8ger4spp  a,b,c,d,e,f

		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvbf16ger2 (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvbf16ger2  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvbf16ger2nn (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvbf16ger2nn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvbf16ger2np (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvbf16ger2np  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvbf16ger2pn (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvbf16ger2pn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvbf16ger2pp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvbf16ger2pp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf16ger2 (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf16ger2  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf16ger2nn (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf16ger2nn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf16ger2np (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf16ger2np  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf16ger2pn (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf16ger2pn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf16ger2pp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf16ger2pp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf32ger (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf32ger  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf32gernn (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf32gernn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf32gernp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf32gernp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf32gerpn (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf32gerpn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf32gerpp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf32gerpp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf64ger (__vector_quad* a, __vector_pair b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf64ger  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf64gernn (__vector_quad* a, __vector_pair b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf64gernn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf64gernp (__vector_quad* a, __vector_pair b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf64gernp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf64gerpn (__vector_quad* a, __vector_pair b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf64gerpn  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvf64gerpp (__vector_quad* a, __vector_pair b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvf64gerpp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi16ger2 (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi16ger2  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi16ger2pp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi16ger2pp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi16ger2s (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi16ger2s  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi16ger2spp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi16ger2spp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi4ger8 (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi4ger8  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi4ger8pp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi4ger8pp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi8ger4 (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi8ger4  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi8ger4pp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi8ger4pp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	      <row>
		<entry>
		  <programlisting>
 void __builtin_mma_xvi8ger4spp (__vector_quad* a, vuc b, vuc c)
		  </programlisting>
		</entry>
		<entry>
		  <programlisting>
 xvi8ger4spp  a,b,c
		  </programlisting>
		</entry>
	      </row>
	    </tbody>
	  </tgroup>
	</informaltable>
      </para>
    </section>
  </section>

</chapter>