Please format the intrinsics reference so it is machine parsable #2

Ideally would be great to either have custom attributes on the docbook or, even better, have a rigid dialect that can be then converted to the docbook markup and the eventual conformance verifiers such as the one implemented in rust stdsimd.

Ideally would be great to either have custom attributes on the docbook or, even better, have a rigid dialect that can be then converted to the docbook markup and the eventual conformance verifiers such as the one implemented in [rust stdsimd](https://github.com/rust-lang-nursery/stdsimd/tree/master/crates/stdsimd-verify).

To elaborate on what @lu-zero mentioned, currently, the specification is in XML format (https://github.com/OpenPOWERFoundation/ELFv2-ABI/tree/master/specification) but due to the current way in which the XML its been structured, it is really hard to process these files.

What we want to do is: read the spec file, find all intrinsics part of the C API, extract their name, their arguments name and types, their return type, whether any args are immediates, their doc string, and that's pretty much it. I have no idea about how to do that easily with the XML files provided.

OTOH, Intel makes this trivial (https://raw.githubusercontent.com/rust-lang-nursery/stdsimd/master/crates/stdsimd-verify/x86-intel.xml):

<intrinsic tech='MMX' rettype='__int64' name='_m_to_int64'>
	<type>Integer</type>
	<CPUID>MMX</CPUID>
	<category>Convert</category>
	<parameter varname='a' type='__m64'/>
	<description>Copy 64-bit integer "a" to "dst".</description>
	<operation>
dst[63:0] := a[63:0]
	</operation>
	<instruction name='movq' form='r64, mm'/>
	<header>mmintrin.h</header>
</intrinsic>

And ARM makes this easy as well (although not as easy as Intel):

<div class="intrinsic"><input id="vfmlslq_laneq_high_u32" type="checkbox"><label for="vfmlslq_laneq_high_u32"><div>float32x4_t <b><b>vfmlslq_laneq_high_u32</b></b> (float32x4_t r, float16x8_t a, float16x8_t b, const int lane)<span class="right">Floating-point fused Multiply-Subtract Long from accumulator</span></div></label><article>      <h4>A64 Instruction</h4><pre><a href="https://developer.arm.com/docs/100069/latest/a64-simd-vector-instructions/fmlsl-vector">FMLSL2</a> Vd.4S,Vn.4H,Vm.H[lane] </pre>      <h4>Argument Preparation</h4><pre>r &rarr; Vd.4S <br /> 0 &lt;&lt; lane &lt;&lt; 7 </pre>      <h4>Result</h4>      <pre>Vd.4S &rarr; result| </pre>  <h4>Supported architectures</h4>      <p>A32/A64</p> </article>  </div>

I could imagine that if somebody wouldn't have done the manual labor already, LLVM would actually prefer to generate TableGen files from the spec, that are then used to generate the headers.

To elaborate on what @lu-zero mentioned, currently, the specification is in XML format (https://github.com/OpenPOWERFoundation/ELFv2-ABI/tree/master/specification) but due to the current way in which the XML its been structured, it is really hard to process these files. What we want to do is: read the spec file, find all intrinsics part of the C API, extract their name, their arguments name and types, their return type, whether any args are immediates, their doc string, and that's pretty much it. I have no idea about how to do that _easily_ with the XML files provided. OTOH, Intel makes this trivial (https://raw.githubusercontent.com/rust-lang-nursery/stdsimd/master/crates/stdsimd-verify/x86-intel.xml): ```xml <intrinsic tech='MMX' rettype='__int64' name='_m_to_int64'> <type>Integer</type> <CPUID>MMX</CPUID> <category>Convert</category> <parameter varname='a' type='__m64'/> <description>Copy 64-bit integer "a" to "dst".</description> <operation> dst[63:0] := a[63:0] </operation> <instruction name='movq' form='r64, mm'/> <header>mmintrin.h</header> </intrinsic> ``` And ARM makes this easy as well (although not as easy as Intel): ```html <div class="intrinsic"><input id="vfmlslq_laneq_high_u32" type="checkbox"><label for="vfmlslq_laneq_high_u32"><div>float32x4_t <b><b>vfmlslq_laneq_high_u32</b></b> (float32x4_t r, float16x8_t a, float16x8_t b, const int lane)<span class="right">Floating-point fused Multiply-Subtract Long from accumulator</span></div></label><article> <h4>A64 Instruction</h4><pre><a href="https://developer.arm.com/docs/100069/latest/a64-simd-vector-instructions/fmlsl-vector">FMLSL2</a> Vd.4S,Vn.4H,Vm.H[lane] </pre> <h4>Argument Preparation</h4><pre>r → Vd.4S <br /> 0 << lane << 7 </pre> <h4>Result</h4> <pre>Vd.4S → result| </pre> <h4>Supported architectures</h4> <p>A32/A64</p> </article> </div> ``` I could imagine that if somebody wouldn't have done the manual labor already, LLVM would actually prefer to generate TableGen files from the spec, that are then used to generate the headers.

Hi, folks...

I appreciate the suggestions. Unfortunately this isn't an "official" project, but
something that I try to work on in what little spare time I have. (It's taken me
over a year to get to the point that it's at, and I'm hopeful that I can finish a
first pass of the document by the end of the year, but can't commit to it.)

The XML format follows the OpenPOWER Foundation documentation guidelines. I'm not
an XML expert, so modifying XML schemas and so forth would require research time
that I don't have. So without contributions from others, this is the form that a
first version of the document is going to take.

If you would like to contribute the necessary XML code to format intrinsics in a
way that's convenient to your needs, I can look into adding that as time goes on.
Just be aware of my constraints; when things get busy, there are sometimes gaps
of several months between my own contributions.

Thanks,
Bill

On 8/20/18 6:23 AM, gnzlbg wrote:

To elaborate on what @lu-zero https://github.com/lu-zero mentioned,
currently, the specification is in XML format
(https://github.com/OpenPOWERFoundation/ELFv2-ABI/tree/master/specification)
but due to the current way in which the XML its been structured, it is
really hard to process these files.

What we want to do is: read the spec file, find all intrinsics part of
the C API, extract their name, their arguments name and types, their
return type, whether any args are immediates, their doc string, and
that's pretty much it. I have no idea about how to do that /easily/
with the XML files provided.

OTOH, Intel makes this trivial
(https://raw.githubusercontent.com/rust-lang-nursery/stdsimd/master/crates/stdsimd-verify/x86-intel.xml):
Integer MMX Convert Copy 64-bit integer "a" to "dst". dst[63:0] := a[63:0] mmintrin.h
And ARM makes this easy as well (although not as easy as Intel):
float32x4_t vfmlslq_laneq_high_u32 (float32x4_t r, float16x8_t a, float16x8_t b, const int lane)Floating-point fused Multiply-Subtract Long from accumulator
A64 Instruction
FMLSL2 Vd.4S,Vn.4H,Vm.H[lane] 
Argument Preparation
r → Vd.4S 
 0 << lane << 7 
Result
Vd.4S → result| 
Supported architectures

A32/A64
I could imagine that if somebody wouldn't have done the manual labor
already, LLVM would actually prefer to generate TableGen files from
the spec, that are then used to generate the headers.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/OpenPOWERFoundation/Programming-Guides/issues/2#issuecomment-414283124,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFQ-KX9ZtsgUQKKlEEltigJJaU-mrzKnks5uSpwggaJpZM4WC1fH.

Hi, folks... I appreciate the suggestions. Unfortunately this isn't an "official" project, but something that I try to work on in what little spare time I have. (It's taken me over a year to get to the point that it's at, and I'm hopeful that I can finish a first pass of the document by the end of the year, but can't commit to it.) The XML format follows the OpenPOWER Foundation documentation guidelines. I'm not an XML expert, so modifying XML schemas and so forth would require research time that I don't have. So without contributions from others, this is the form that a first version of the document is going to take. If you would like to contribute the necessary XML code to format intrinsics in a way that's convenient to your needs, I can look into adding that as time goes on. Just be aware of my constraints; when things get busy, there are sometimes gaps of several months between my own contributions. Thanks, Bill On 8/20/18 6:23 AM, gnzlbg wrote: > > To elaborate on what @lu-zero <https://github.com/lu-zero> mentioned, > currently, the specification is in XML format > (https://github.com/OpenPOWERFoundation/ELFv2-ABI/tree/master/specification) > but due to the current way in which the XML its been structured, it is > really hard to process these files. > > What we want to do is: read the spec file, find all intrinsics part of > the C API, extract their name, their arguments name and types, their > return type, whether any args are immediates, their doc string, and > that's pretty much it. I have no idea about how to do that /easily/ > with the XML files provided. > > OTOH, Intel makes this trivial > (https://raw.githubusercontent.com/rust-lang-nursery/stdsimd/master/crates/stdsimd-verify/x86-intel.xml): > > <intrinsic tech='MMX' rettype='__int64' name='_m_to_int64'> > <type>Integer</type> > <CPUID>MMX</CPUID> > <category>Convert</category> > <parameter varname='a' type='__m64'/> > <description>Copy 64-bit integer "a" to "dst".</description> > <operation> > dst[63:0] := a[63:0] > </operation> > <instruction name='movq' form='r64, mm'/> > <header>mmintrin.h</header> > </intrinsic> > > And ARM makes this easy as well (although not as easy as Intel): > > <div class="intrinsic"><input id="vfmlslq_laneq_high_u32" type="checkbox"><label for="vfmlslq_laneq_high_u32"><div>float32x4_t <b><b>vfmlslq_laneq_high_u32</b></b> (float32x4_t r, float16x8_t a, float16x8_t b, const int lane)<span class="right">Floating-point fused Multiply-Subtract Long from accumulator</span></div></label><article> <h4>A64 Instruction</h4><pre><a href="https://developer.arm.com/docs/100069/latest/a64-simd-vector-instructions/fmlsl-vector">FMLSL2</a> Vd.4S,Vn.4H,Vm.H[lane] </pre> <h4>Argument Preparation</h4><pre>r → Vd.4S <br /> 0 << lane << 7 </pre> <h4>Result</h4> <pre>Vd.4S → result| </pre> <h4>Supported architectures</h4> <p>A32/A64</p> </article> </div> > > I could imagine that if somebody wouldn't have done the manual labor > already, LLVM would actually prefer to generate TableGen files from > the spec, that are then used to generate the headers. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/OpenPOWERFoundation/Programming-Guides/issues/2#issuecomment-414283124>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFQ-KX9ZtsgUQKKlEEltigJJaU-mrzKnks5uSpwggaJpZM4WC1fH>. >

This remains desirable, and there's some discussion about a machine-readable file to drive some of these documents. That probably won't happen before first publication of this, though.

Would a JSON format like this be sufficient?

    {
        "mnemonic": "vec_abs",
        "name": "Vector Absolute Value",
        "syntax": "r = vec_abs (a)",
        "purpose": "Returns a vector r that contains the\n      absolute values of the contents of the vector\n      a.",
        "result": "The value of each element of r is the\n      absolute value of the corresponding element of\n      a.  For integer vectors, the arithmetic\n      is modular.",
        "endianness": "None.",
        "instructions": [ "vspltisw", "vsububm", "vmaxsb", "vsubuwm", "vmaxsw", "vsubudm", "vmaxsd", "xvabssp", "xvabsdp" ],
        "type_signatures": {
            "var_heads": [ "r", "a", "Example Implementation" ],
            "list": [
                [ "vector signed char", "vector signed char", "vspltisw t,0\n   vsububm  t,t,a\n   vmaxsb   r,t,a" ],
                [ "vector signed short", "vector signed short", "vspltisw t,0\n   vsubuhm  t,t,a\n   vmaxsh   r,t,a" ],
                [ "vector signed int", "vector signed int", "vspltisw t,0\n   vsubuwm  t,t,a\n   vmaxsw   r,t,a" ],
                [ "vector signed long long", "vector signed long long", "vspltisw t,0\n   vsubudm  t,t,a\n   vmaxsd   r,t,a" ],
                [ "vector float", "vector float", "xvabssp  r,a" ],
                [ "vector double", "vector double", "xvabsdp  r,a" ]
            ]
        }
    }

Would a JSON format like this be sufficient? ``` { "mnemonic": "vec_abs", "name": "Vector Absolute Value", "syntax": "r = vec_abs (a)", "purpose": "Returns a vector r that contains the\n absolute values of the contents of the vector\n a.", "result": "The value of each element of r is the\n absolute value of the corresponding element of\n a. For integer vectors, the arithmetic\n is modular.", "endianness": "None.", "instructions": [ "vspltisw", "vsububm", "vmaxsb", "vsubuwm", "vmaxsw", "vsubudm", "vmaxsd", "xvabssp", "xvabsdp" ], "type_signatures": { "var_heads": [ "r", "a", "Example Implementation" ], "list": [ [ "vector signed char", "vector signed char", "vspltisw t,0\n vsububm t,t,a\n vmaxsb r,t,a" ], [ "vector signed short", "vector signed short", "vspltisw t,0\n vsubuhm t,t,a\n vmaxsh r,t,a" ], [ "vector signed int", "vector signed int", "vspltisw t,0\n vsubuwm t,t,a\n vmaxsw r,t,a" ], [ "vector signed long long", "vector signed long long", "vspltisw t,0\n vsubudm t,t,a\n vmaxsd r,t,a" ], [ "vector float", "vector float", "xvabssp r,a" ], [ "vector double", "vector double", "xvabsdp r,a" ] ] } } ```

Something that could be fed to something like https://github.com/rust-lang/stdarch/tree/master/crates/stdarch-gen would be optimal, that json seems something that could be used to speed up moving from hand-crafted to automatically generated.

Thanks for the pointer! It looks like there are a couple of files relevant to this discussion:

neon.spec
src/main.rs

For neon.spec, there are stanzas, and a representative is:

/// Vector bitwise or (immediate, inclusive)
name = vorr
fn = simd_or
arm = vorr
aarch64 = orr
a = 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F
b = 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
validate 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F
generate int*_t, uint*_t, int64x*_t, uint64x*_t

I presume an appropriate new mapping would be something like:

name: defined as "The prefix of the function", which leaves a lot to the imagination ;-) I don't yet understand this.
fn: defined as "The function to call in rust-land.", which I don't yet understand.
arm, aarch64: seem to be the name of the intrinsic in the respective ISAs? This might be akin to mnemonic in the JSON above?
a, b: testing input data, presumably associated with the input parameter types; maps to fields like a and the associated types for a in list in the JSON; likely hand-crafted?
validate: testing output data, presumably associated with the output type; maps to r and the associated types for r in list in the JSON; in the absence of algorithmic information in the spec file, this is also hand-crafted?
generate: this seems to be types, although it's not completely clear... is it assuming that a and b are the same type which could be any of the 4 listed types?

For src/main.rs, this parses neon.spec and produces an appropriate src/{arm,aarch64}/neon/generated.rs file. This generated file contains...? I stumbled around, and it looks like it ends up at https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/aarch64/neon/generated.rs, is that right? This appears to be implementations of the intrinsics, but I'm not sure how the spec file data gets transformed into an implementation.

Note: my expertise in rust can be easily counted in minutes.

Thanks for the pointer! It looks like there are a couple of files relevant to this discussion: - `neon.spec` - `src/main.rs` For `neon.spec`, there are stanzas, and a representative is: ``` /// Vector bitwise or (immediate, inclusive) name = vorr fn = simd_or arm = vorr aarch64 = orr a = 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F b = 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 validate 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F generate int*_t, uint*_t, int64x*_t, uint64x*_t ``` I presume an appropriate new mapping would be something like: - `name`: defined as "The prefix of the function", which leaves a lot to the imagination ;-) I don't yet understand this. - `fn`: defined as "The function to call in rust-land.", which I don't yet understand. - `arm`, `aarch64`: seem to be the name of the intrinsic in the respective ISAs? This might be akin to `mnemonic` in the JSON above? - `a`, `b`: testing input data, presumably associated with the input parameter types; maps to fields like `a` and the associated types for `a` in `list` in the JSON; likely hand-crafted? - `validate`: testing output data, presumably associated with the output type; maps to `r` and the associated types for `r` in `list` in the JSON; in the absence of algorithmic information in the `spec` file, this is also hand-crafted? - `generate`: this seems to be types, although it's not completely clear... is it assuming that `a` and `b` are the same type which could be any of the 4 listed types? For `src/main.rs`, this parses `neon.spec` and produces an appropriate `src/{arm,aarch64}/neon/generated.rs` file. This generated file contains...? I stumbled around, and it looks like it ends up at https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/aarch64/neon/generated.rs, is that right? This appears to be implementations of the intrinsics, but I'm not sure how the spec file data gets transformed into an implementation. Note: my expertise in rust can be easily counted in minutes.

the final endpoint is core::arch with the arch-specific module e.g. the arm one.

In the example we have vorr that in source is defined

/// Vector bitwise or (immediate, inclusive)
#[inline]
#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[cfg_attr(all(test, target_arch = "arm"), assert_instr(vorr))]
#[cfg_attr(all(test, target_arch = "aarch64"), assert_instr(orr))]
pub unsafe fn vorr_s8(a: int8x8_t, b: int8x8_t) -> int8x8_t {
    simd_or(a, b)
}

in stdarch we have two kind of tests, one that checks the expected mnemonic appears in while disassembling a testcase that the assert_instr() decorator generates for us and another is making sure that the instruction itself behaves as expected, and that's the validate entry.

the fn field is used for one-liners that might call any rust code. link-arm and link-aarch64 is a shorthand for linking llvm-specific symbol to a function name and then call it.

I hope it helps :) I can probably try to poke the author of the script to document it a little better.

the final endpoint is [core::arch](https://doc.rust-lang.org/core/arch/index.html) with the arch-specific module e.g. the [arm](https://doc.rust-lang.org/core/arch/arm/index.html) one. In the example we have [vorr](https://doc.rust-lang.org/core/arch/arm/fn.vorr_s8.html) that in source is [defined](https://doc.rust-lang.org/src/core/up/up/stdarch/crates/core_arch/src/arm/neon/generated.rs.html#178-180) ``` /// Vector bitwise or (immediate, inclusive) #[inline] #[target_feature(enable = "neon")] #[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))] #[cfg_attr(all(test, target_arch = "arm"), assert_instr(vorr))] #[cfg_attr(all(test, target_arch = "aarch64"), assert_instr(orr))] pub unsafe fn vorr_s8(a: int8x8_t, b: int8x8_t) -> int8x8_t { simd_or(a, b) } ``` in stdarch we have two kind of tests, one that checks the expected mnemonic appears in while disassembling a testcase that the `assert_instr()` decorator generates for us and another is making sure that the instruction itself behaves as expected, and that's the `validate` entry. the `fn` field is used for one-liners that might call any rust code. `link-arm` and `link-aarch64` is a shorthand for linking llvm-specific symbol to a function name and then call it. I hope it helps :) I can probably try to poke the author of the script to document it a little better.

Labels Milestones

Please format the intrinsics reference so it is machine parsable #2

A64 Instruction

Argument Preparation

Result

Supported architectures