diff --git a/Intrinsics_Reference/ch_biendian.xml b/Intrinsics_Reference/ch_biendian.xml
index 8b7a82a..8c4d76a 100644
--- a/Intrinsics_Reference/ch_biendian.xml
+++ b/Intrinsics_Reference/ch_biendian.xml
@@ -1505,6 +1505,245 @@ a[3] = c;
and the second integer element of b
, in that
order.
+
+ The big endian result is {0x00010203, 0x1c1d1e1f, 0x0c0d0e0f,
+ 0x14151617}, as shown here:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ a
+
+
+ 00
+
+
+ 01
+
+
+ 02
+
+
+ 03
+
+
+ 04
+
+
+ 05
+
+
+ 06
+
+
+ 07
+
+
+ 08
+
+
+ 09
+
+
+ 0A
+
+
+ 0B
+
+
+ 0C
+
+
+ 0D
+
+
+ 0E
+
+
+ 0F
+
+
+
+
+ b
+
+
+ 10
+
+
+ 11
+
+
+ 12
+
+
+ 13
+
+
+ 14
+
+
+ 15
+
+
+ 16
+
+
+ 17
+
+
+ 18
+
+
+ 19
+
+
+ 1A
+
+
+ 1B
+
+
+ 1C
+
+
+ 1D
+
+
+ 1E
+
+
+ 1F
+
+
+
+
+ c
+
+
+ 0
+
+
+ 1
+
+
+ 2
+
+
+ 3
+
+
+ 28
+
+
+ 29
+
+
+ 30
+
+
+ 31
+
+
+ 12
+
+
+ 13
+
+
+ 14
+
+
+ 15
+
+
+ 20
+
+
+ 21
+
+
+ 22
+
+
+ 23
+
+
+
+
+ t
+
+
+ 00
+
+
+ 01
+
+
+ 02
+
+
+ 03
+
+
+ 1C
+
+
+ 1D
+
+
+ 1E
+
+
+ 1F
+
+
+ 0C
+
+
+ 0D
+
+
+ 0E
+
+
+ 0F
+
+
+ 14
+
+
+ 15
+
+
+ 16
+
+
+ 17
+
+
+
+
+
For little endian, the modified PCV is elementwise subtracted
from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
@@ -1515,10 +1754,247 @@ a[3] = c;
vperm
instruction will again select entire
elements using the groups of 4 contiguous bytes, and the
values of the integers will be reordered without compromising
- each integer's contents. The fact that the little-endian
- result matches the big-endian result is left as an exercise
- for the reader.
+ each integer's contents. The little-endian result matches the
+ big-endian result, as shown. Observe that a and b switch positions for little endian
+ code generation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ b
+
+
+ 1C
+
+
+ 1D
+
+
+ 1E
+
+
+ 1F
+
+
+ 18
+
+
+ 19
+
+
+ 1A
+
+
+ 1B
+
+
+ 14
+
+
+ 15
+
+
+ 16
+
+
+ 17
+
+
+ 10
+
+
+ 11
+
+
+ 12
+
+
+ 13
+
+
+
+
+ a
+
+
+ 0C
+
+
+ 0D
+
+
+ 0E
+
+
+ 0F
+
+
+ 08
+
+
+ 09
+
+
+ 0A
+
+
+ 0B
+
+
+ 04
+
+
+ 05
+
+
+ 06
+
+
+ 07
+
+
+ 00
+
+
+ 01
+
+
+ 02
+
+
+ 03
+
+
+
+
+ c
+
+
+ 8
+
+
+ 9
+
+
+ 10
+
+
+ 11
+
+
+ 16
+
+
+ 17
+
+
+ 18
+
+
+ 19
+
+
+ 0
+
+
+ 1
+
+
+ 2
+
+
+ 3
+
+
+ 28
+
+
+ 29
+
+
+ 30
+
+
+ 31
+
+
+
+
+ t
+
+
+ 14
+
+
+ 15
+
+
+ 16
+
+
+ 17
+
+
+ 0C
+
+
+ 0D
+
+
+ 0E
+
+
+ 0F
+
+
+ 1C
+
+
+ 1D
+
+
+ 1E
+
+
+ 1F
+
+
+ 00
+
+
+ 01
+
+
+ 02
+
+
+ 03
+
+
+
+
+
Now, suppose instead that the original PCV does not reorder
entire integers at once:
@@ -1528,6 +2004,241 @@ a[3] = c;
The result of the big-endian implementation would be:
t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ a
+
+
+ 00
+
+
+ 01
+
+
+ 02
+
+
+ 03
+
+
+ 04
+
+
+ 05
+
+
+ 06
+
+
+ 07
+
+
+ 08
+
+
+ 09
+
+
+ 0A
+
+
+ 0B
+
+
+ 0C
+
+
+ 0D
+
+
+ 0E
+
+
+ 0F
+
+
+
+
+ b
+
+
+ 10
+
+
+ 11
+
+
+ 12
+
+
+ 13
+
+
+ 14
+
+
+ 15
+
+
+ 16
+
+
+ 17
+
+
+ 18
+
+
+ 19
+
+
+ 1A
+
+
+ 1B
+
+
+ 1C
+
+
+ 1D
+
+
+ 1E
+
+
+ 1F
+
+
+
+
+ c
+
+
+ 0
+
+
+ 20
+
+
+ 31
+
+
+ 4
+
+
+ 7
+
+
+ 17
+
+
+ 6
+
+
+ 19
+
+
+ 30
+
+
+ 3
+
+
+ 2
+
+
+ 8
+
+
+ 9
+
+
+ 13
+
+
+ 5
+
+
+ 22
+
+
+
+
+ t
+
+
+ 00
+
+
+ 14
+
+
+ 1F
+
+
+ 04
+
+
+ 07
+
+
+ 11
+
+
+ 06
+
+
+ 13
+
+
+ 1E
+
+
+ 03
+
+
+ 02
+
+
+ 08
+
+
+ 09
+
+
+ 0D
+
+
+ 05
+
+
+ 16
+
+
+
+
+
For little-endian, the modified PCV would be
{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
@@ -1539,6 +2250,241 @@ a[3] = c;
which bears no resemblance to the big-endian result.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ b
+
+
+ 1C
+
+
+ 1D
+
+
+ 1E
+
+
+ 1F
+
+
+ 18
+
+
+ 19
+
+
+ 1A
+
+
+ 1B
+
+
+ 14
+
+
+ 15
+
+
+ 16
+
+
+ 17
+
+
+ 10
+
+
+ 11
+
+
+ 12
+
+
+ 13
+
+
+
+
+ a
+
+
+ 0C
+
+
+ 0D
+
+
+ 0E
+
+
+ 0F
+
+
+ 08
+
+
+ 09
+
+
+ 0A
+
+
+ 0B
+
+
+ 04
+
+
+ 05
+
+
+ 06
+
+
+ 07
+
+
+ 00
+
+
+ 01
+
+
+ 02
+
+
+ 03
+
+
+
+
+ c
+
+
+ 9
+
+
+ 26
+
+
+ 18
+
+
+ 22
+
+
+ 23
+
+
+ 29
+
+
+ 28
+
+
+ 1
+
+
+ 12
+
+
+ 25
+
+
+ 14
+
+
+ 24
+
+
+ 27
+
+
+ 0
+
+
+ 11
+
+
+ 31
+
+
+
+
+ t
+
+
+ 15
+
+
+ 06
+
+
+ 0E
+
+
+ 0A
+
+
+ 0B
+
+
+ 01
+
+
+ 00
+
+
+ 1D
+
+
+ 10
+
+
+ 05
+
+
+ 12
+
+
+ 04
+
+
+ 07
+
+
+ 1C
+
+
+ 17
+
+
+ 03
+
+
+
+
+
The lesson here is to only use vec_perm
to
reorder entire elements of a vector. If you must use vec_perm