From b521f3e0b37c903b10f88ca9ab572f22971a7897 Mon Sep 17 00:00:00 2001 From: Bill Schmidt Date: Fri, 1 May 2020 14:37:49 -0500 Subject: [PATCH 1/3] Add pcv descriptions for vec_reve & vec_extract_fp32_from_short[hl] --- Intrinsics_Reference/ch_vec_reference.xml | 113 ++++++++++++++++++++-- 1 file changed, 104 insertions(+), 9 deletions(-) diff --git a/Intrinsics_Reference/ch_vec_reference.xml b/Intrinsics_Reference/ch_vec_reference.xml index bf8e5b7..0dccfd4 100644 --- a/Intrinsics_Reference/ch_vec_reference.xml +++ b/Intrinsics_Reference/ch_vec_reference.xml @@ -13210,9 +13210,16 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> single-precision IEEE numbers. Endian considerations: The element numbering within a register is left-to-right for big-endian - targets, and right-to-left for little-endian targets. Thus the - permute control vector at address pcv - in the example implementation will differ for big- and little-endian. + targets, and right-to-left for little-endian targets. + + Notes: The example + implementation assumes that the vperm instruction is used for + big-endian, and the vpermr instruction is used for + little-endian. The permute control vector for the vperm or + vpermr instruction is in a memory location identified by pcv. + The value located at pcv is identical in natural element order + for big- and little-endian: { 15, 14, 0, 0, 13, 12, 0, 0, 11, + 10, 0, 0, 9, 8, 0, 0 }. @@ -13223,6 +13230,10 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> vperm vec_extract_fp32_from_shorth + + vpermr + vec_extract_fp32_from_shorth + xvcvhpsp vec_extract_fp32_from_shorth @@ -13266,7 +13277,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> lxv t,0(pcv) - vperm u,a,a,t + vperm[r] u,a,a,t xvcvhpsp r,u @@ -13300,9 +13311,16 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> single-precision IEEE numbers. Endian considerations: The element numbering within a register is left-to-right for big-endian - targets, and right-to-left for little-endian targets. Thus the - permute control vector at address pcv - in the example implementation will differ for big- and little-endian. + targets, and right-to-left for little-endian targets. + + Notes: The example + implementation assumes that the vperm instruction is used for + big-endian, and the vpermr instruction is used for + little-endian. The permute control vector for the vperm or + vpermr instruction is in a memory location identified by pcv. + The value located at pcv is identical in natural element order + for big- and little-endian: { 7, 6, 0, 0, 5, 4, 0, 0, 3, 2, 0, + 0, 1, 0, 0, 0 }. @@ -13313,6 +13331,10 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> vperm vec_extract_fp32_from_shortl + + vpermr + vec_extract_fp32_from_shortl + xvcvhpsp vec_extract_fp32_from_shortl @@ -13356,7 +13378,7 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> lxv t,0(pcv) - vperm u,a,a,t + vperm[r] u,a,a,t xvcvhpsp r,u @@ -25636,8 +25658,81 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> Notes: The example implementations assume that the permute control vector for the vperm or vpermr instruction is in a register identified by pcv. - The value of pcv differs based on the element size. + The value of pcv differs based on the element size, and is the + same (in natural element order) for big- and little-endian, + assuming the use of vperm for big-endian and vpermr for + little-endian. + + + + + + + + Vector types + + + Permute control vector + + + + + + + + vector char + + + + + { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, + 0 } + + + + + + + vector short + + + + + { 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, + 1 } + + + + + + + vector int, vector float + + + + + { 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, + 3 } + + + + + + + vector long long, vector double + + + + + { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, + 7 } + + + + + + vperm -- 2.34.1 From 8c13a59ceb2e9ca89861f5f84999d0ceeecf83f8 Mon Sep 17 00:00:00 2001 From: "Paul A. Clarke" Date: Thu, 30 Apr 2020 20:49:27 -0500 Subject: [PATCH 2/3] Add example for vec_permxor `vec_permxor` is complex enough to warrant an example. The table used is very wide. I squeezed it more by changing the names of `index1` and `index2` to `x` and `y`, respectively. Note: The table may still be too wide, as some additional warnings are now generated during `mvn generate-sources`: ``` WARNING: Line 1 of a paragraph overflows the available area by more than 50 points. (See position 1:-1) Apr 30, 2020 8:44:03 PM org.apache.fop.events.LoggingEventListener processEvent WARNING: Line 1 of a paragraph overflows the available area by 468 millipoints. (See position 14259:-1) Apr 30, 2020 8:44:03 PM org.apache.fop.events.LoggingEventListener processEvent ... WARNING: span="inherit" on fo:block, but no explicit value found on the parent FO. ``` Fixes #20. Signed-off-by: Paul A. Clarke --- Intrinsics_Reference/ch_vec_reference.xml | 487 +++++++++++++++++++++- 1 file changed, 483 insertions(+), 4 deletions(-) diff --git a/Intrinsics_Reference/ch_vec_reference.xml b/Intrinsics_Reference/ch_vec_reference.xml index 0dccfd4..46ab8aa 100644 --- a/Intrinsics_Reference/ch_vec_reference.xml +++ b/Intrinsics_Reference/ch_vec_reference.xml @@ -24581,13 +24581,492 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> Result value: For each i (0 ≤ i < 16), let - index1 be bits 0–3 and - index2 be bits 4–7 of byte element + x be bits 0–3 and + y be bits 4–7 of byte element i of c. Byte element i of r - is set to the exclusive-OR of byte elements index1 - of a and index2 + is set to the exclusive-OR of byte elements x + of a and y of b. + An example follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + a + + + F0 + + + F1 + + + F2 + + + F3 + + + F4 + + + F5 + + + F6 + + + F7 + + + F8 + + + F9 + + + FA + + + FB + + + FC + + + FD + + + FE + + + FF + + + + + b + + + FF + + + EF + + + DF + + + CF + + + BF + + + AF + + + 9F + + + 8F + + + 7F + + + 6F + + + 5F + + + 4F + + + 3F + + + 2F + + + 1F + + + 0F + + + + + c + + + 01 + + + 23 + + + 45 + + + 67 + + + 89 + + + AB + + + CD + + + EF + + + F0 + + + E1 + + + D2 + + + C3 + + + B4 + + + A5 + + + 96 + + + 87 + + + + + x + + + y + + + 0 + + + 1 + + + 2 + + + 3 + + + 4 + + + 5 + + + 6 + + + 7 + + + 8 + + + 9 + + + A + + + B + + + C + + + D + + + E + + + F + + + F + + + 0 + + + E + + + 1 + + + D + + + 2 + + + C + + + 3 + + + B + + + 4 + + + A + + + 5 + + + 9 + + + 6 + + + 8 + + + 7 + + + + + ax + + + by + + + F0 + + + EF + + + F2 + + + CF + + + F4 + + + AF + + + F6 + + + 8F + + + F8 + + + 6F + + + FA + + + 4F + + + FC + + + 2F + + + FE + + + 0F + + + FF + + + FF + + + FE + + + EF + + + FD + + + DF + + + FC + + + CF + + + FB + + + BF + + + FA + + + AF + + + F9 + + + 9F + + + F8 + + + 8F + + + + + r + + + 1F + + + 3D + + + 5B + + + 79 + + + 97 + + + B5 + + + D3 + + + F1 + + + 00 + + + 11 + + + 22 + + + 33 + + + 44 + + + 55 + + + 66 + + + 77 + + + + + + Endian considerations: The element numbering within a register is left-to-right for big-endian targets, and right-to-left for little-endian targets. -- 2.34.1 From c67e95d81d4fe982f2708ccadc3053d685b0191e Mon Sep 17 00:00:00 2001 From: "Paul A. Clarke" Date: Fri, 1 May 2020 13:22:20 -0500 Subject: [PATCH 3/3] Add example for vec_pmsum_be `vec_pmsum_be` is complex enough to warrant an example. Fixes #33. Signed-off-by: Paul A. Clarke --- Intrinsics_Reference/ch_vec_reference.xml | 99 +++++++++++++++++++++++ 1 file changed, 99 insertions(+) diff --git a/Intrinsics_Reference/ch_vec_reference.xml b/Intrinsics_Reference/ch_vec_reference.xml index bf8e5b7..abd0455 100644 --- a/Intrinsics_Reference/ch_vec_reference.xml +++ b/Intrinsics_Reference/ch_vec_reference.xml @@ -24727,6 +24727,105 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.vec-ref"> i + 1 of a and b. + An example follows for inputs of type vector unsigned int: + + + + + + + + + + + + + a + + + A3000000 + + + 00A20000 + + + 0000A100 + + + 000000A0 + + + + + b + + + 00B30000 + + + 0000B200 + + + 000000B1 + + + B00000B0 + + + + + binary polynomial multiplicands + + + A3000000 + 00B30000 + + + 00A20000 + 0000B200 + + + 0000A100 + 000000B1 + + + 000000A0 + B00000B0 + + + + + intermediate results + XOR operands + + + 004E350000000000 + + + 0000004E24000000 + + + 00000000004E1100 + + + 0000004E00004E00 + + + + + r + + + 004E354E24000000 + + + 0000004E004E5F00 + + + + + + Endian considerations: All element numberings in the above description denote big-endian -- 2.34.1