Review comments from Steve Munroe (pages 1-15) #6

1.1 A brief History
Page 1

“The VSRs can represent all the data types representable by the VRs, and can also be treated as containing two 64-bit integers or two 64-bit double-precision floating-point values.”

Sadly this (two 64-bit integers ) is not supported until PowerISA 2.07 (POWER8).

POWER7 only had some converts between floating-point and doubleword integer and Permute doubleword immediate. But no vector doubleword integer operations.

In POWER8/9 all Vector Doubleword Integer operations where VMX encoded (restricted to 32 VRs).

I was appalled by this ;)

Also
“a VSR can now contain a single 128-bit integer;”

Technically a VSR may contain a Quadword Integer, But Vector Quadword Integer operations are only possible in VRs (VSRs 32-63).

May also contain a single 31 digit signed BCD integer (for add/subtract).

May also contain a single 16 digit signed Zoned Decimal integer (for conversion to/from BCD).

and
“and starting with POWER9, a VSR can contain a single 128-bit floating-point value”

Technically a VSR may contain a Quadword Float128, But Vector Quadword FP operations are only possible in VRs (VSRs 32-63).

2.1. Language Elements
Page 4-5

“Vector literals may be specified using a type cast and a set of literal initializers in parentheses or

braces.”

Note that current C compilers do not support literals (either decimal or hexadecimal) for __Int128 or BCD types.

Using int or long long type literals to construct vector __int128 constants need to be endian aware and reverse element order for LE.

2.2. Vector Data Types
Page 5

“One vector type may be cast to another vector type without restriction.”

Note Despite the fact that Float128 (__float128) types are held and operated on in VRs, they are considered to be scalars and can not be cast to any vector type.

But vector __int128 is a vector type, go figure …

2.4. Vector Layout and Element Numbering
page 7

“Consequently, the vector numbering schemes can be described as big-endian and little-endian vector layouts and vector element numberings.”

This gets a little weird then input/output element sizes are different. For example Vector integer multiply even/odd or unpack. For example Vector Multiply Odd Signed Word take two vector int and returns vector long long. For LE this is converted to a Vector Multiply Even Signed Word instruction.

But the doubleword product elements are BE (byte, halfword, word) internally. This turns into a head scratcher if you need to access words for the multiply low word result or are doing multiple precision integer arithmetic for multiply quadword.

2.7.3. Limitations on bi-endianness of vec_perm
page 11

“If you must use vec_perm for another purpose, your code must include a test for endianness and separate algorithms for big- and little-endian.”

like ya! See PVECLIB for examples!

3.2. Use Portable Intrinsics
page 13

If you are using an older version of GCC ot CLANG you may find that not all of the intrinsics in this manual are implemented. Or are not yet implemented to the specific type you are using.

Also could be ISA 3.0 or later, Phased in, Deferred, or Deprecated.

Another reason to look at PVECLIB

3.3. Use Assembly Code Sparingly
page 13

Another reason to look at PVECLIB

4.1. How to Use This Reference
page 15

I recommend adding latency/throughput numbers for P8/P9 It is such a pain to look these up in UM and may not be obvious where the compiler generates multiple instruction sequences.

“ISA 3.0 or later”
PVECLIB provides POWER7/8 equivalent implementations for many ISA 3.0 vector instructions.

1.1 A brief History Page 1 “The VSRs can represent all the data types representable by the VRs, and can also be treated as containing two 64-bit integers or two 64-bit double-precision floating-point values.” Sadly this (two 64-bit integers ) is not supported until PowerISA 2.07 (POWER8). POWER7 only had some converts between floating-point and doubleword integer and Permute doubleword immediate. But no vector doubleword integer operations. In POWER8/9 all Vector Doubleword Integer operations where VMX encoded (restricted to 32 VRs). I was appalled by this ;) Also “a VSR can now contain a single 128-bit integer;” Technically a VSR may contain a Quadword Integer, But Vector Quadword Integer operations are only possible in VRs (VSRs 32-63). May also contain a single 31 digit signed BCD integer (for add/subtract). May also contain a single 16 digit signed Zoned Decimal integer (for conversion to/from BCD). and “and starting with POWER9, a VSR can contain a single 128-bit floating-point value” Technically a VSR may contain a Quadword Float128, But Vector Quadword FP operations are only possible in VRs (VSRs 32-63). 2.1. Language Elements Page 4-5 “Vector literals may be specified using a type cast and a set of literal initializers in parentheses or braces.” Note that current C compilers do not support literals (either decimal or hexadecimal) for __Int128 or BCD types. Using int or long long type literals to construct vector __int128 constants need to be endian aware and reverse element order for LE. 2.2. Vector Data Types Page 5 “One vector type may be cast to another vector type without restriction.” Note Despite the fact that Float128 (__float128) types are held and operated on in VRs, they are considered to be scalars and can not be cast to any vector type. But vector __int128 is a vector type, go figure … 2.4. Vector Layout and Element Numbering page 7 “Consequently, the vector numbering schemes can be described as big-endian and little-endian vector layouts and vector element numberings.” This gets a little weird then input/output element sizes are different. For example Vector integer multiply even/odd or unpack. For example Vector Multiply Odd Signed Word take two vector int and returns vector long long. For LE this is converted to a Vector Multiply Even Signed Word instruction. But the doubleword product elements are BE (byte, halfword, word) internally. This turns into a head scratcher if you need to access words for the multiply low word result or are doing multiple precision integer arithmetic for multiply quadword. 2.7.3. Limitations on bi-endianness of vec_perm page 11 “If you must use vec_perm for another purpose, your code must include a test for endianness and separate algorithms for big- and little-endian.” like ya! See PVECLIB for examples! 3.2. Use Portable Intrinsics page 13 If you are using an older version of GCC ot CLANG you may find that not all of the intrinsics in this manual are implemented. Or are not yet implemented to the specific type you are using. Also could be ISA 3.0 or later, Phased in, Deferred, or Deprecated. Another reason to look at PVECLIB 3.3. Use Assembly Code Sparingly page 13 Another reason to look at PVECLIB 4.1. How to Use This Reference page 15 I recommend adding latency/throughput numbers for P8/P9 It is such a pain to look these up in UM and may not be obvious where the compiler generates multiple instruction sequences. “ISA 3.0 or later” PVECLIB provides POWER7/8 equivalent implementations for many ISA 3.0 vector instructions.

May also contain a single 16 digit signed Zoned Decimal integer (for conversion to/from BCD).

Is new for POWER9

>> May also contain a single 16 digit signed Zoned Decimal integer (for conversion to/from BCD). Is new for POWER9

commit 4a0e35f1d6 (HEAD -> master, origin/master, origin/HEAD)
Author: Bill Schmidt wschmidt@linux.ibm.com
Date: Thu Oct 17 11:54:45 2019 -0500

Make updates for comments received so far, including issue #4 and
issue #5.  XL bug report support for Linux is still pending.

commit 4a0e35f1d6b5834c5c6076ac19fc3756b08f9102 (HEAD -> master, origin/master, origin/HEAD) Author: Bill Schmidt <wschmidt@linux.ibm.com> Date: Thu Oct 17 11:54:45 2019 -0500 Make updates for comments received so far, including issue #4 and issue #5. XL bug report support for Linux is still pending.

(Should have said issue #5 and #6.)

Labels Milestones

Review comments from Steve Munroe (pages 1-15) #6