vec_gather and vec_scatter would be useful to have for float32/64 and int32/64
v = vec_gather(ptr0, ptr1, ptr2, ptr2)
vec_scatter(v, ptr0, ptr1, ptr2, ptr2)
Another option is to have a single base pointer and a stride.
I have versions of these implemented in PVECLIB (master but not release 1.0.4).
For example: https://munroesj52.github.io/vec__f32__ppc_8h.html#a80b2d79f8d3afb30b92638ad78c80ca9
As you can see it covers both scalar offset and vector index forms. The scalar form is the best performing as the vector forms require a transfer/shift left to GPR scalars for the actual load/store.
You can pull from https://github.com/open-power-sdk/pveclib and try this yourself.
Any plans for P10 support since there are new ISA 3.1 commands that might be helpful. Also does your version support both BE and LE?
Have already updated/added ~70 operations for P10. Mostly exploiting P10 instructions for existing pveclib double/quadword integer operations.
Still need to get a handle on integer divide for P7/8/9.
Currently working on Float128 round-to-odd operations for P7/8. The compiler has not always been helpful with a number of exposed bugz.
If you are looking for P10's VSX Matrix-Multiply Assist (MMA), the compile (GCC 11) already provides builtin support:
If you are looking for transparent pveclib support for MMA, that is a bigger kettle of fish. For P7/8/9 this would require emulation of the MMA Accumulators and software MMA kernels for the various matrix operations and types.
This is a big job for one person's part-time hobby.
And it is not clear how many users there are for this low level API. I assume there are a small number of library developers, with the majority just using the library APIs? But library developers seem to be the target audience for pveclib.
Anyway for now I need to complete Float128 and vector integer divide.
Deleting a branch is permanent. It CANNOT be undone. Continue?