How vector elements change size and type

How vector elements change size and type Most vector built ins return the same vector type as the (first) input parameters, but there are exceptions. Examples include conversions between types, compares, pack, unpack, merge, and integer multiply operations. Converting floats to / from integer types will change the type and sometimes change the element size as well (double ↔ int and float ↔ long). For VMX the conversions are always the same size (float ↔ [unsigned] int). But VSX allows conversion of 64-bit (long or double) to from 32-bit (float or int) with the inherent size changes. The PowerISA VSX defines a 4-element vector layout where little endian elements 0, 2 are used for input/output and elements 1,3 are undefined. The OpenPOWER ABI Appendix A defines vec_double and vec_float with even/odd and high/low extensions as program aids. These are not included in GCC 7 or earlier but are planned for GCC 8. Compare operations produce either vector bool <input element type> (effectively bit masks) or predicates (the condition code for all and any are represented as an int truth variable). When a predicate compare (i.e. vec_all_eq, vec_any_gt) is used in an if statement, the condition code is used directly in the conditional branch and the int truth value is not generated. Pack operations pack integer elements into the next smaller (half) integer sized elements. Pack operations include signed and unsigned saturate and unsigned modulo forms. As the packed result will be half the size (in bits), pack instructions require 2 vectors (256-bits) as input and generate a single 128-bit vector result. Unpack operations expand integer elements into the next larger size elements. The integers are always treated as signed values and sign-extended. The processor design avoids instructions that return multiple register values. So the PowerISA defines unpack-high and unpack low forms where instruction takes (the high or low) half of vector elements and extends them to fill the vector output. Element order is maintained and an unpack high / low sequence with the same input vector has the effect of unpacking to a 256-bit result in two vector registers. Merge operations resemble shuffling two (vectors) card decks together, alternating (elements) cards in the result. As we are merging from 2 vectors (256-bits) into 1 vector (128-bits) and the elements do not change size, we have merge high and merge low instruction forms for each (byte, halfword and word) integer type. The merge high operations alternate elements from the (vector register left) high half of the two input vectors. The merge low operation alternate elements from the (vector register right) low half of the two input vectors. For PowerISA 2.07 we added vector merge word even / odd instructions. Instead of high or low elements the shuffle is from the even or odd number elements of the two input vectors. Passing the same vector to both inputs to merge produces splat-like results for each doubleword half, which is handy in some convert operations. Integer multiply has the potential to generate twice as many bits in the product as input. A multiply of 2 int (32-bit) values produces a long (64-bits). Normal C language * operations ignore this and discard the top 32-bits of the result. However in some computations it useful to preserve the double product precision for intermediate computation before reducing the final result back to the original precision. The PowerISA VMX instruction set took the later approach, i.e., keep all the product bits until the programmer explicitly asks for the truncated result (via the pack operation). So the vector integer multiple are split into even/odd forms across signed and unsigned byte, halfword and word inputs. This requires two instructions (given the same inputs) to generate the full vector multiply across 2 vector registers and 256-bits. Again as POWER processors are super-scalar this pair of instructions should execute in parallel. The set of expanded product values can either be used directly in further (doubled precision) computation or merged/packed into the single single vector at the smaller bit size. This is what the compiler will generate for C vector extension multiply of vector integer types.