Error in types for vec_bperm #53

Closed
opened 8 years ago by wschmidt-ibm · 0 comments
wschmidt-ibm commented 8 years ago (Migrated from github.com)

Point brought to my attention by Ian. We are in agreement that this should be fixed. Text of his note below.

In general:
The XL compilers have not yet implemented vector [un]signed __int128. It's ok for the ABI to specify it, but (1) it should be marked as "phased in", and (2) if XL users need to use a BIF now then there needs to be an alternate prototype using another type like vector unsigned char. Of course we should be implementing vector [un]signed __int128, and I've requested that, but it's been deferred several times due to higher priority needs.

In this case:
The ABI says vec_bperm is supposed to generate either a vbpermq or vbpermd instruction depending on the type:
vector unsigned __int128 ==> Power8 vbpermq (1 16-bit result)
vector unsigned char or vector unsigned long long ==> Power9 vbpermd (2 8-bit results)
That's incompatible with the previous definition:
vector unsigned char ==> Power8 vbpermq (1 16-bit result)
It should have been:
vector unsigned char or vector unsigned __int128 ==> Power8 vbpermq (1 16-bit result)
vector unsigned long long ==> Power9 vbpermd (2 8-bit results)
which adds vector unsigned __int128 as a synonym parameter type for the Power8 BIF and instruction, and adds the new Power9 BIF and instruction.

These instructions are confusing, and I believe our ABI team (including me) was confused by them. I think the XL compiler implementation is the way it should be, and the ABI should be corrected. The other questions are what the GCC team implemented and what the LLVM team is implementing.

at http://www.ibm.com/support/knowledgecenter/zh/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_bperm_p8.html says
vec_bperm

Purpose
Gathers up to 16 1-bit values from a quadword in the specified order, and places them in the specified order in the rightmost 16 bits of the leftmost doubleword of the result vector register, with the rest of the result zeroed.
This built-in function is valid only when -qarch is set to target POWER8™ processors.
Syntax
d=vec_bperm(a, b)
Result and argument types
The type of d, a, and b must be vector unsigned char.
Result value
For each i (0 <= i < 16), let index denote the byte value of the ith element of b.
If index is greater than or equal to 128, bit 48+i of the result is set to 0.
If index is smaller than 128, bit 48+i of the result is set to the value of the indexth bit of input a.

That puts 16 result bits right justified into the left half of the result and zeros the right half, the same as the Power8 vpermq instruction.
.

Vector Bit Permute Quadword VX-form
vbpermq VRT,VRA,VRB

if MSR.VEC=0 then Vector_Unavailable()
do i = 0 to 15
index <= VR[VRB].byte[j]
if index < 128 then
perm.bit[i] <= VR[VRA].bit[index]
else
perm.bit[i] <= 0
end
VR[VRT].dword[0] <= Chop(EXTZ(perm),64)
VR[VRT].dword[1] <= 0x0000_0000_0000_0000

For each integer value i from 0 to 15, do the following.
Let index be the contents of byte element i of VR[VRB].
If index is less than 128, then the contents of bit index of VR[VRA] are placed into bit 48+i of doubleword element i of VR[VRT]. Otherwise, bit 48+i of doubleword element i of VR[VRT] is set to 0.
The contents of bits 0:47 of VR[VRT] are set to 0.
The contents of bits 63:127 of VR[VRT] are set to 0.

Special Registers Altered:
None

That puts 16 result bits right justified into the left half of the result and zeros the right half.
.

The ISA v3.0 (Power9) vpermd instruction description:
Vector Bit Permute Doubleword VX-form
vbpermd VRT,VRA,VRB

For each integer value i from 0 to 1, and for each integer value j from 0 to 7, do the following.
Let index be the contents of byte sub-element j of doubleword element i of VR[VRB].
If index is less than 64, then the contents of bit index of doubleword i of VR[VRA] are placed into bit 56+j of doubleword element i of VR[VRT].
Otherwise, bit 56+j of doubleword element i of VR[VRT] is set to 0.
The contents of bits 0:55 of doubleword element i of VR[VRT] are set to 0.
Special Registers Altered:
None

That puts 8 result bits right justified into the left half of the result and 8 result bits right justified into the right half.
It's a SIMD version of the Power7 bperm instruction, which puts 8 result bits right justified into a GPR.
.

Point brought to my attention by Ian. We are in agreement that this should be fixed. Text of his note below. In general: The XL compilers have not yet implemented vector [un]signed __int128. It's ok for the ABI to specify it, but (1) it should be marked as "phased in", and (2) if XL users need to use a BIF now then there needs to be an alternate prototype using another type like vector unsigned char. Of course we should be implementing vector [un]signed __int128, and I've requested that, but it's been deferred several times due to higher priority needs. In this case: The ABI says vec_bperm is supposed to generate either a vbpermq or vbpermd instruction depending on the type: vector unsigned __int128 ==> Power8 vbpermq (1 16-bit result) vector unsigned char or vector unsigned long long ==> Power9 vbpermd (2 8-bit results) That's incompatible with the previous definition: vector unsigned char ==> Power8 vbpermq (1 16-bit result) It should have been: vector unsigned char or vector unsigned __int128 ==> Power8 vbpermq (1 16-bit result) vector unsigned long long ==> Power9 vbpermd (2 8-bit results) which adds vector unsigned __int128 as a synonym parameter type for the Power8 BIF and instruction, and adds the new Power9 BIF and instruction. These instructions are confusing, and I believe our ABI team (including me) was confused by them. I think the XL compiler implementation is the way it should be, and the ABI should be corrected. The other questions are what the GCC team implemented and what the LLVM team is implementing. at http://www.ibm.com/support/knowledgecenter/zh/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_bperm_p8.html says vec_bperm Purpose Gathers up to 16 1-bit values from a quadword in the specified order, and places them in the specified order in the rightmost 16 bits of the leftmost doubleword of the result vector register, with the rest of the result zeroed. This built-in function is valid only when -qarch is set to target POWER8™ processors. Syntax d=vec_bperm(a, b) Result and argument types The type of d, a, and b must be vector unsigned char. Result value For each i (0 <= i < 16), let index denote the byte value of the ith element of b. If index is greater than or equal to 128, bit 48+i of the result is set to 0. If index is smaller than 128, bit 48+i of the result is set to the value of the indexth bit of input a. That puts 16 result bits right justified into the left half of the result and zeros the right half, the same as the Power8 vpermq instruction. . Vector Bit Permute Quadword VX-form vbpermq VRT,VRA,VRB if MSR.VEC=0 then Vector_Unavailable() do i = 0 to 15 index <= VR[VRB].byte[j] if index < 128 then perm.bit[i] <= VR[VRA].bit[index] else perm.bit[i] <= 0 end VR[VRT].dword[0] <= Chop(EXTZ(perm),64) VR[VRT].dword[1] <= 0x0000_0000_0000_0000 For each integer value i from 0 to 15, do the following. Let index be the contents of byte element i of VR[VRB]. If index is less than 128, then the contents of bit index of VR[VRA] are placed into bit 48+i of doubleword element i of VR[VRT]. Otherwise, bit 48+i of doubleword element i of VR[VRT] is set to 0. The contents of bits 0:47 of VR[VRT] are set to 0. The contents of bits 63:127 of VR[VRT] are set to 0. Special Registers Altered: None That puts 16 result bits right justified into the left half of the result and zeros the right half. . The ISA v3.0 (Power9) vpermd instruction description: Vector Bit Permute Doubleword VX-form vbpermd VRT,VRA,VRB For each integer value i from 0 to 1, and for each integer value j from 0 to 7, do the following. Let index be the contents of byte sub-element j of doubleword element i of VR[VRB]. If index is less than 64, then the contents of bit index of doubleword i of VR[VRA] are placed into bit 56+j of doubleword element i of VR[VRT]. Otherwise, bit 56+j of doubleword element i of VR[VRT] is set to 0. The contents of bits 0:55 of doubleword element i of VR[VRT] are set to 0. Special Registers Altered: None That puts 8 result bits right justified into the left half of the result and 8 result bits right justified into the right half. It's a SIMD version of the Power7 bperm instruction, which puts 8 result bits right justified into a GPR. .
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: systemsoftware/ELFv2-ABI#53
Loading…
There is no content yet.