vec_gather and vec_scatter would be useful to have for float32/64 and int32/64 #87

Open
opened 9 months ago by ChipKerchner · 5 comments

v = vec_gather(ptr0, ptr1, ptr2, ptr2)
vec_scatter(v, ptr0, ptr1, ptr2, ptr2)

v = vec_gather(ptr0, ptr1, ptr2, ptr2) vec_scatter(v, ptr0, ptr1, ptr2, ptr2)
Poster

Should ptr1, ptr2, ptr3 be optional?

Should ptr1, ptr2, ptr3 be optional?
ChipKerchner changed title from vec_gather and vec_scatter would be useful to have to vec_gather and vec_scatter would be useful to have for float32/32 and int32/64 9 months ago
ChipKerchner changed title from vec_gather and vec_scatter would be useful to have for float32/32 and int32/64 to vec_gather and vec_scatter would be useful to have for float32/64 and int32/64 9 months ago
Poster

Another option is to have a single base pointer and a stride.

Another option is to have a single base pointer and a stride.

I have versions of these implemented in PVECLIB (master but not release 1.0.4).
For example: https://munroesj52.github.io/vec__f32__ppc_8h.html#a80b2d79f8d3afb30b92638ad78c80ca9

Also: https://munroesj52.github.io/vec__int32__ppc_8h.html#aedabb123d258fd438bda778262d5b33f
https://munroesj52.github.io/vec__f64__ppc_8h.html#aeb96abe1cd1f78c4eee978add6268c80
https://munroesj52.github.io/vec__int64__ppc_8h.html#af18331cf83e5ec6965978589f812ead9

As you can see it covers both scalar offset and vector index forms. The scalar form is the best performing as the vector forms require a transfer/shift left to GPR scalars for the actual load/store.

You can pull from https://github.com/open-power-sdk/pveclib and try this yourself.

I have versions of these implemented in PVECLIB (master but not release 1.0.4). For example: https://munroesj52.github.io/vec__f32__ppc_8h.html#a80b2d79f8d3afb30b92638ad78c80ca9 Also: https://munroesj52.github.io/vec__int32__ppc_8h.html#aedabb123d258fd438bda778262d5b33f https://munroesj52.github.io/vec__f64__ppc_8h.html#aeb96abe1cd1f78c4eee978add6268c80 https://munroesj52.github.io/vec__int64__ppc_8h.html#af18331cf83e5ec6965978589f812ead9 As you can see it covers both scalar offset and vector index forms. The scalar form is the best performing as the vector forms require a transfer/shift left to GPR scalars for the actual load/store. You can pull from https://github.com/open-power-sdk/pveclib and try this yourself.
Poster

Any plans for P10 support since there are new ISA 3.1 commands that might be helpful. Also does your version support both BE and LE?

Any plans for P10 support since there are new ISA 3.1 commands that might be helpful. Also does your version support both BE and LE?

Have already updated/added ~70 operations for P10. Mostly exploiting P10 instructions for existing pveclib double/quadword integer operations.

Still need to get a handle on integer divide for P7/8/9.
Currently working on Float128 round-to-odd operations for P7/8. The compiler has not always been helpful with a number of exposed bugz.

If you are looking for P10's VSX Matrix-Multiply Assist (MMA), the compile (GCC 11) already provides builtin support:
https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/PowerPC-Matrix-Multiply-Assist-Built-in-Functions.html#PowerPC-Matrix-Multiply-Assist-Built-in-Functions

If you are looking for transparent pveclib support for MMA, that is a bigger kettle of fish. For P7/8/9 this would require emulation of the MMA Accumulators and software MMA kernels for the various matrix operations and types.

This is a big job for one person's part-time hobby.

And it is not clear how many users there are for this low level API. I assume there are a small number of library developers, with the majority just using the library APIs? But library developers seem to be the target audience for pveclib.

Anyway for now I need to complete Float128 and vector integer divide.

Have already updated/added ~70 operations for P10. Mostly exploiting P10 instructions for existing pveclib double/quadword integer operations. Still need to get a handle on integer divide for P7/8/9. Currently working on Float128 round-to-odd operations for P7/8. The compiler has not always been helpful with a number of exposed bugz. If you are looking for P10's VSX Matrix-Multiply Assist (MMA), the compile (GCC 11) already provides builtin support: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/PowerPC-Matrix-Multiply-Assist-Built-in-Functions.html#PowerPC-Matrix-Multiply-Assist-Built-in-Functions If you are looking for transparent pveclib support for MMA, that is a bigger kettle of fish. For P7/8/9 this would require emulation of the MMA Accumulators and software MMA kernels for the various matrix operations and types. This is a big job for one person's part-time hobby. And it is not clear how many users there are for this low level API. I assume there are a small number of library developers, with the majority just using the library APIs? But library developers seem to be the target audience for pveclib. Anyway for now I need to complete Float128 and vector integer divide.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.