vec_bperm picture
#14
Closed
opened 5 years ago by wschmidt-ibm
·
8 comments
Loading…
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
PC has requested a picture.
vec_bperm
presents some unique issues when running in little-endian mode.The function signatures of the builtin are:
The underlying instructions used for the implementation are, respectively:
vbpermq
vbpermq
vbpermd
Comparing the first two instances which use the same underlying implementation...
Define a constant permute vector:
...which seeks to collect the low-order 16 bits of the source vector.
Define vectors which have the same in-register representation:
The in-register (big-endian) representation for both is:
Since the same instruction is used identically in the implementation, the result is the same:
This doesn't match the current description in the document, which says:
I think the above statement should instead say:
...at least for a return type of
vector unsigned long long
.The above statement would have to be changed as follows for a return type of
vector unsigned char
:...or something like that. A simpler solution would be to define the return value in this case to be
unsigned long long
.Questions:
unsigned long long
in all cases?You're misunderstanding the endian restriction, I think. You have to provide data in big-endian order, and expect results in big-endian order, with all bit and byte designations defined in big-endian order. There is no expectation that providing little-endian data order will make any sense. The probable best to-do for this is to clarify this in the description.
The same is true for things like vec_pmsum_be. The data model for using such things is purely big-endian.
What we need to do is review all of these big-endian-only instructions and review the need for little-endian versions of them supported in a future version of the ISA. This may lead to creating new builtins that are guaranteed to handle both BE and LE usages.
Actually, the "Endian considerations" already seem to describe this pretty adequately, but if you want to tickle that language, please suggest something.
This also applies to vec_cipher_be, vec_cipherlast_be, vec_gb, vec_ncipher_be, vec_ncipherlast_be, vec_pmsum_be, vec_sbox_be, and vec_shasigma_be. It's unfortunate that vec_bperm wasn't originally named vec_bperm_be, but it's too late to correct this oversight except by allowing an alias.
How does this:
ever match this:
Even the ISA doesn't match the language immediately above:
The instruction stores the results in the low-order bits of the first (high-order) doubleword element of the vector. So, the description in this document is not correct, even for BE. And, it is non-sensical when treated as
vector unsigned char
, because the results show up in elements [6..7] in BE.As an example, this series of code attempts to BE-ize and un-BE-ize the use of the instruction on an LE system, here the goal is simply to produce the low-order (BE) 16 bits, in reverse:
(Note that I display the "BE" vectors in BE-order.) The output is:
The low order 16 bits are
FF A9
, which becomes95 FF
when reversed.Note that this result shows up in elements 6 and 7 of the
vector unsigned char
in both BE and LE (after swapping).This simply does not match the current description in this document.
The description is definitely wrong, especially for vbpermq. This has been misdescribed since the original document was written. I will work on a better description for it that matches the ISA.
But please don't use LE examples, because they simply do not apply meaningfully to these instructions and it complicates everything. vec_bperm does not follow the bi-endian programming model.
For vbpermq, the only error is that instead of "bit i of r", the third bullet should say "bit i+48 of r". That leaves bits 0:47 and 64:127 as 0, as they should be.
For vbpermd, the first subbullet should replace "jth element of b" with "(i*8+j)th element of b." The second and third subbullets should replace "bit j" with "bit 56+j".
That should follow the ISA correctly. Thanks for pointing this out! Please let me know if you disagree with those changes.
Opened #47 to track this; I'll fix.
Description fixed in #47. Please let me know if a BE example now makes better sense.