Finish chapter 2.

pull/69/head
Bill Schmidt 5 years ago
parent c086fbb288
commit 7a3454dc78

@ -1034,18 +1034,149 @@ register vector double vd = vec_splats(*double_ptr);</programlisting>
</section>

<section>
<title>Examples</title>
<para>filler</para>
<title>Examples and Limitations</title>
<section>
<title>Unaligned vector access</title>
<para>
A common programming error is to cast a pointer to a base type
(such as <code>int</code>) to a pointer of the corresponding
vector type (such as <code>vector int</code>), and then
dereference the pointer. This constitutes undefined behavior,
because it casts a pointer with a smaller alignment
requirement to a pointer with a larger alignment requirement.
Compilers may not produce code that you expect in the presence
of undefined behavior.
</para>
<para>
Thus, do not write the following:
</para>
<programlisting> int a[4096];
vector int x = *((vector int *) a);</programlisting>
<para>
Instead, write this:
</para>
<programlisting> int a[4096];
vector int x = vec_xl (0, a);</programlisting>
</section>
<section>
<title>vec_sld is not bi-endian</title>
<para>
One oddity in the bi-endian vector programming model is that
<code>vec_sld</code> has big-endian semantics for code
compiled for both big-endian and little-endian targets. That
is, any code that uses <code>vec_sld</code> without guarding
it with a test on endianness is likely to be incorrect.
</para>
<para>
At the time that the bi-endian model was being developed, it
was discovered that existing code in several Linux packages
was using <code>vec_sld</code> in order to perform multiplies,
or to otherwise shift portions of base elements left. A
straightforward little-endian implementation of
<code>vec_sld</code> would concatenate the two input vectors
in reverse order and shift bytes to the right. This would
only give compatible results for <code>vector char</code>
types. Those using this intrinsic as a cheap multiply, or to
shift bytes within larger elements, would see different
results on little-endian versus big-endian with such an
implementation. Therefore it was decided that
<code>vec_sld</code> would not have a bi-endian
implementation.
</para>
<para>
<code>vec_sro</code> is not bi-endian for similar reasons.
</para>
</section>

<section>
<title>Limitations</title>
<title>Limitations on bi-endianness of vec_perm</title>
<para>
The <code>vec_perm</code> intrinsic is bi-endian, provided
that it is used to reorder entire elements of the input
vectors.
</para>
<para>
To see why this is, let's examine the code generation for
</para>
<programlisting> vector int t;
vector int a = (vector int){0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f};
vector int b = (vector int){0x10111213, 0x14151617, 0x18191a1b, 0x1c1d1e1f};
vector char c = (vector char){0,1,2,3,28,29,30,31,12,13,14,15,20,21,22,23};
t = vec_perm (a, b, c);</programlisting>
<para>
<code>vec_sld</code>
For big endian, a compiler should generate:
</para>
<programlisting> vperm t,a,b,c</programlisting>
<para>
<code>vec_perm</code>
For little endian targeting a POWER8 system, a compiler should
generate:
</para>
<programlisting> vnand d,c,c
vperm t,b,a,d</programlisting>
<para>
For little endian targeting a POWER9 system, a compiler should
generate:
</para>
<programlisting> vpermr t,b,a,c</programlisting>
<para>
Note that the <code>vpermr</code> instruction takes care of
modifying the permute control vector (PCV) <code>c</code> that
was done using the <code>vnand</code> instruction for POWER8.
Because only the bottom 5 bits of each element of the PCV are
read by the hardware, this has the effect of subtracting the
original elements of the PCV from 31.
</para>
<para>
Note also that the PCV <code>c</code> has element values that
are contiguous in groups of 4. This selects entire elements
from the input vectors <code>a</code> and <code>b</code> to
reorder. Thus the intent of the code is to select the first
integer element of <code>a</code>, the last integer element of
<code>b</code>, the last integer element of <code>a</code>,
and the second integer element of <code>b</code>, in that
order.
</para>
<para>
For little endian, the modified PCV is elementwise subtracted
from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
Since the elements appear in reverse order in a register when
loaded from little-endian memory, the elements appear in the
register from left to right as
{8,9,10,11,16,17,18,19,0,1,2,3,28,29,30,31}. So the following
<code>vperm</code> instruction will again select entire
elements using the groups of 4 contiguous bytes, and the
values of the integers will be reordered without compromising
each integer's contents. The fact that the little-endian
result matches the big-endian result is left as an exercise to
the reader.
</para>
<para>
Now, suppose instead that the original PCV does not reorder
entire integers at once:
</para>
<programlisting> vector char c = (vector char){0,20,31,4,7,17,6,19,30,3,2,8,9,13,5,22};</programlisting>
<para>
The result of the big-endian implementation would be:
</para>
<programlisting> t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};</programlisting>
<para>
For little-endian, the modified PCV would be
{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
the register as
{9,26,18,22,23,29,28,1,12,25,14,24,27,0,11,31}. The final
little-endian result would be
</para>
<programlisting> t = {0x071c1703, 0x10051204, 0x0b01001d, 0x15060e0a};</programlisting>
<para>
which bears no resemblance to the big-endian result.
</para>
<para>
The lesson here is to only use <code>vec_perm</code> to
reorder entire elements of a vector. If you must use vec_perm
for another purpose, your code must include a test for
endianness and separate algorithms for big- and
little-endian.
</para>
</section>
</section>

</chapter>

Loading…
Cancel
Save