You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

#### 1364 lines 33 KiB Raw Permalink Blame History

 `` `` ` ` ` Matrix-Multiply Assist (MMA) Intrinsic Reference` `
` ` Introduction` ` ` ` Version 3.1 of the Power Instruction Set Architecture` ` Specification (see )` ` introduced instructions to accelerate matrix multiplication` ` computations. These instructions operate both on the VSRs and` ` on eight new 512-bit accumulator registers (ACCs). Intrinsic` ` functions to access these instructions are described in this` ` chapter.` ` ` ` ` ` Although the ACCs are treated as separate registers from the` ` VSRs, each ACC[i] may use its associated VSRs` ` 4i to 4i+3 as scratch space. That is,` ` when ACC[i] contains defined data, the contents of` ` VSRs 4i to 4i+3 are undefined until` ` an xxmfacc instruction is used to copy the contents` ` of ACC[i] to the VSRs. Writing to a VSR associated` ` with ACC[i] that contains defined data will cause` ` ACC[i] to become undefined.` ` ` ` ` ` This reference is not intended to be a complete introduction to` ` MMA concepts. The reader is directed to the Matrix-Multiply` ` Assist Best Practices Guide (see ) and to the POWER ISA.` ` ` `
` `
` ` Type Support` ` ` ` Many of the MMA instructions operate on aligned pairs of vectors` ` (that is, an even numbered vector and the next-higher numbered` ` vector), or on aligned quads of vectors (that is, a vector` ` number divisible by four and the three next-higher numbered` ` vectors). Compilers that support the MMA intrinsic functions` ` must define two types, __vector_pair and` ` __vector_quad, to represent these concepts.` ` Pointers and references to these types must also be supported` ` where these concepts exist in the source language.` ` ` `
` `
` ` Intrinsic Functions` ` ` ` The intrinsics in this section are not overloaded. Each is` ` presented with its prototype and the instruction it represents.` ` The string "vuc" is used as shorthand for "vector unsigned` ` char" throughout.` ` ` `
` ` Memory Access` ` ` ` Load and store vector pairs.` ` ` ` ` ` lxvp` ` __builtin_vsx_lxvp` ` ` ` ` ` stxvp` ` __builtin_vsx_stxvp` ` ` ` ` ` lxvpx` ` __builtin_vsx_lxvp` ` ` ` ` ` stxvpx` ` __builtin_vsx_stxvp` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` ` ` ` ` Prototype` ` ` ` ` ` Instruction` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` __vector_pair __builtin_vsx_lxvp (signed long a, const __vector_pair* b)` ` ` ` ` ` ` ` ` ` lxvp r,a(b)` ` or` ` lxvpx r,b,a` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_vsx_stxvp (__vector_pair s, signed long a, const __vector_pair* b)` ` ` ` ` ` ` ` ` ` stxvp s,a(b)` ` or` ` stxvpx s,b,a` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` Assembly and Disassembly of Large Types` ` ` ` The following intrinsics are used to construct` ` __vector_pair and __vector_quad` ` objects from 128-bit vectors, and deconstruct them into such` ` vectors. The disassembly interfaces place the results into` ` arrays of vectors using natural element order. The build` ` interfaces treat the vector input arguments as if they form an` ` array of vectors, with the first vector argument being array` ` element 0 in natural element order, the second vector argument` ` being array element 1, and so forth. The assemble interfaces` ` are deprecated because they do not give consistent results for` ` big- and little-endian targets, and users should use the build` ` interfaces instead.` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` ` ` ` ` Prototype` ` ` ` ` ` Notes` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_assemble_acc (__vector_quad*, vuc, vuc, vuc, vuc)` ` ` ` ` ` ` ` Deprecated` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_build_acc (__vector_quad*, vuc, vuc, vuc, vuc)` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_disassemble_acc (void*, __vector_quad*)` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_vsx_assemble_pair (__vector_pair*, vuc, vuc)` ` ` ` ` ` ` ` Deprecated` ` ` ` ` ` ` ` ` ` ` ` void __builtin_vsx_build_pair (__vector_pair*, vuc, vuc)` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_vsx_disassemble_pair (void*, __vector_pair*)` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` Accumulator Clear Operation` ` ` ` This intrinsic function initializes an accumulator to zeros.` ` ` ` ` ` xxsetaccz` ` __builtin_mma_xxsetaccz` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` ` ` ` ` Prototype` ` ` ` ` ` Instruction` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xxsetaccz (__vector_quad* a)` ` ` ` ` ` ` ` ` ` xxsetaccz a` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` Conversion Operations` ` ` ` These instructions convert between vectors of single precision` ` and bfloat16 types.` ` ` ` ` ` xvcvbf16spn` ` __builtin_vsx_xvcvbf16spn` ` ` ` ` ` xvcvspbf16` ` __builtin_vsx_xvcvspbf16` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` ` ` ` ` Prototype` ` ` ` ` ` Instruction` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` vuc __builtin_vsx_xvcvbf16spn (vuc a)` ` ` ` ` ` ` ` ` ` xvcvbf16spn a` ` ` ` ` ` ` ` ` ` ` ` ` ` vuc __builtin_vsx_xvcvspbf16 (vuc a)` ` ` ` ` ` ` ` ` ` xvcvspbf16 a` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` Outer Product Operations` ` ` ` Each of these intrinsics generates an instruction to perform` ` an outer product operation.` ` ` ` ` ` pmxvbf16ger2` ` __builtin_mma_pmxvbf16ger2` ` ` ` ` ` pmxvbf16ger2nn` ` __builtin_mma_pmxvbf16ger2nn` ` ` ` ` ` pmxvbf16ger2np` ` __builtin_mma_pmxvbf16ger2np` ` ` ` ` ` pmxvbf16ger2pn` ` __builtin_mma_pmxvbf16ger2pn` ` ` ` ` ` pmxvbf16ger2pp` ` __builtin_mma_pmxvbf16ger2pp` ` ` ` ` ` pmxvf16ger2` ` __builtin_mma_pmxvf16ger2` ` ` ` ` ` pmxvf16ger2nn` ` __builtin_mma_pmxvf16ger2nn` ` ` ` ` ` pmxvf16ger2np` ` __builtin_mma_pmxvf16ger2np` ` ` ` ` ` pmxvf16ger2pn` ` __builtin_mma_pmxvf16ger2pn` ` ` ` ` ` pmxvf16ger2pp` ` __builtin_mma_pmxvf16ger2pp` ` ` ` ` ` pmxvf32ger` ` __builtin_mma_pmxvf32ger` ` ` ` ` ` pmxvf32gernn` ` __builtin_mma_pmxvf32gernn` ` ` ` ` ` pmxvf32gernp` ` __builtin_mma_pmxvf32gernp` ` ` ` ` ` pmxvf32gerpn` ` __builtin_mma_pmxvf32gerpn` ` ` ` ` ` pmxvf32gerpp` ` __builtin_mma_pmxvf32gerpp` ` ` ` ` ` pmxvf64ger` ` __builtin_mma_pmxvf64ger` ` ` ` ` ` pmxvf64gernn` ` __builtin_mma_pmxvf64gernn` ` ` ` ` ` pmxvf64gernp` ` __builtin_mma_pmxvf64gernp` ` ` ` ` ` pmxvf64gerpn` ` __builtin_mma_pmxvf64gerpn` ` ` ` ` ` pmxvf64gerpp` ` __builtin_mma_pmxvf64gerpp` ` ` ` ` ` pmxvi16ger2` ` __builtin_mma_pmxvi16ger2` ` ` ` ` ` pmxvi16ger2pp` ` __builtin_mma_pmxvi16ger2pp` ` ` ` ` ` pmxvi16ger2s` ` __builtin_mma_pmxvi16ger2s` ` ` ` ` ` pmxvi16ger2spp` ` __builtin_mma_pmxvi16ger2spp` ` ` ` ` ` pmxvi4ger8` ` __builtin_mma_pmxvi4ger8` ` ` ` ` ` pmxvi4ger8pp` ` __builtin_mma_pmxvi4ger8pp` ` ` ` ` ` pmxvi8ger4` ` __builtin_mma_pmxvi8ger4` ` ` ` ` ` pmxvi8ger4pp` ` __builtin_mma_pmxvi8ger4pp` ` ` ` ` ` pmxvi8ger4spp` ` __builtin_mma_pmxvi8ger4spp` ` ` ` ` ` xvbf16ger2` ` __builtin_mma_xvbf16ger2` ` ` ` ` ` xvbf16ger2nn` ` __builtin_mma_xvbf16ger2nn` ` ` ` ` ` xvbf16ger2np` ` __builtin_mma_xvbf16ger2np` ` ` ` ` ` xvbf16ger2pn` ` __builtin_mma_xvbf16ger2pn` ` ` ` ` ` xvbf16ger2pp` ` __builtin_mma_xvbf16ger2pp` ` ` ` ` ` xvf16ger2` ` __builtin_mma_xvf16ger2` ` ` ` ` ` xvf16ger2nn` ` __builtin_mma_xvf16ger2nn` ` ` ` ` ` xvf16ger2np` ` __builtin_mma_xvf16ger2np` ` ` ` ` ` xvf16ger2pn` ` __builtin_mma_xvf16ger2pn` ` ` ` ` ` xvf16ger2pp` ` __builtin_mma_xvf16ger2pp` ` ` ` ` ` xvf32ger` ` __builtin_mma_xvf32ger` ` ` ` ` ` xvf32gernn` ` __builtin_mma_xvf32gernn` ` ` ` ` ` xvf32gernp` ` __builtin_mma_xvf32gernp` ` ` ` ` ` xvf32gerpn` ` __builtin_mma_xvf32gerpn` ` ` ` ` ` xvf32gerpp` ` __builtin_mma_xvf32gerpp` ` ` ` ` ` xvf64ger` ` __builtin_mma_xvf64ger` ` ` ` ` ` xvf64gernn` ` __builtin_mma_xvf64gernn` ` ` ` ` ` xvf64gernp` ` __builtin_mma_xvf64gernp` ` ` ` ` ` xvf64gerpn` ` __builtin_mma_xvf64gerpn` ` ` ` ` ` xvf64gerpp` ` __builtin_mma_xvf64gerpp` ` ` ` ` ` xvi16ger2` ` __builtin_mma_xvi16ger2` ` ` ` ` ` xvi16ger2pp` ` __builtin_mma_xvi16ger2pp` ` ` ` ` ` xvi16ger2s` ` __builtin_mma_xvi16ger2s` ` ` ` ` ` xvi16ger2spp` ` __builtin_mma_xvi16ger2spp` ` ` ` ` ` xvi4ger8` ` __builtin_mma_xvi4ger8` ` ` ` ` ` xvi4ger8pp` ` __builtin_mma_xvi4ger8pp` ` ` ` ` ` xvi8ger4` ` __builtin_mma_xvi8ger4` ` ` ` ` ` xvi8ger4pp` ` __builtin_mma_xvi8ger4pp` ` ` ` ` ` xvi8ger4spp` ` __builtin_mma_xvi8ger4spp` ` ` ` ` ` ` ` ` ` ` ` ` `
` ` ` ` ` ` Prototype` ` ` ` ` ` Instruction` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvbf16ger2 (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvbf16ger2 a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvbf16ger2nn (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvbf16ger2nn a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvbf16ger2np (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvbf16ger2np a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvbf16ger2pn (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvbf16ger2pn a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvbf16ger2pp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvbf16ger2pp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf16ger2 (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvf16ger2 a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf16ger2nn (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvf16ger2nn a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf16ger2np (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvf16ger2np a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf16ger2pn (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvf16ger2pn a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf16ger2pp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvf16ger2pp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf32ger (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf32ger a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf32gernn (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf32gernn a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf32gernp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf32gernp a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf32gerpn (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf32gerpn a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf32gerpp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf32gerpp a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf64ger (__vector_quad* a, __vector_pair b,` ` vuc c, const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf64ger a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf64gernn (__vector_quad* a, __vector_pair b,` ` vuc c, const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf64gernn a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf64gernp (__vector_quad* a, __vector_pair b,` ` vuc c, const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf64gernp a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf64gerpn (__vector_quad* a, __vector_pair b,` ` vuc c, const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf64gerpn a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvf64gerpp (__vector_quad* a, __vector_pair b,` ` vuc c, const int d, const int e)` ` ` ` ` ` ` ` ` ` pmxvf64gerpp a,b,c,d,e` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi16ger2 (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi16ger2 a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi16ger2pp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi16ger2pp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi16ger2s (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi16ger2s a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi16ger2spp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi16ger2spp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi4ger8 (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi4ger8 a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi4ger8pp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi4ger8pp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi8ger4 (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi8ger4 a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi8ger4pp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi8ger4pp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_pmxvi8ger4spp (__vector_quad* a, vuc b, vuc c,` ` const int d, const int e, const int f)` ` ` ` ` ` ` ` ` ` pmxvi8ger4spp a,b,c,d,e,f` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvbf16ger2 (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvbf16ger2 a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvbf16ger2nn (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvbf16ger2nn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvbf16ger2np (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvbf16ger2np a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvbf16ger2pn (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvbf16ger2pn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvbf16ger2pp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvbf16ger2pp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf16ger2 (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf16ger2 a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf16ger2nn (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf16ger2nn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf16ger2np (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf16ger2np a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf16ger2pn (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf16ger2pn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf16ger2pp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf16ger2pp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf32ger (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf32ger a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf32gernn (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf32gernn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf32gernp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf32gernp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf32gerpn (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf32gerpn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf32gerpp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvf32gerpp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf64ger (__vector_quad* a, __vector_pair b, vuc c)` ` ` ` ` ` ` ` ` ` xvf64ger a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf64gernn (__vector_quad* a, __vector_pair b, vuc c)` ` ` ` ` ` ` ` ` ` xvf64gernn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf64gernp (__vector_quad* a, __vector_pair b, vuc c)` ` ` ` ` ` ` ` ` ` xvf64gernp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf64gerpn (__vector_quad* a, __vector_pair b, vuc c)` ` ` ` ` ` ` ` ` ` xvf64gerpn a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvf64gerpp (__vector_quad* a, __vector_pair b, vuc c)` ` ` ` ` ` ` ` ` ` xvf64gerpp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi16ger2 (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi16ger2 a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi16ger2pp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi16ger2pp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi16ger2s (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi16ger2s a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi16ger2spp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi16ger2spp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi4ger8 (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi4ger8 a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi4ger8pp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi4ger8pp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi8ger4 (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi8ger4 a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi8ger4pp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi8ger4pp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` void __builtin_mma_xvi8ger4spp (__vector_quad* a, vuc b, vuc c)` ` ` ` ` ` ` ` ` ` xvi8ger4spp a,b,c` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ``` ``` ``` ```