Some more intrinsic examples

Some more intrinsic examples The intrinsic _mm_cvtpd_ps converts a packed vector double into a packed vector single float. Since only 2 doubles fit into a 128-bit vector only 2 floats are returned and occupy only half (64-bits) of the XMM register. For this intrinsic the 64-bit are packed into the logical left half of the registers and the logical right half of the register is set to zero (as per the Intel cvtpd2ps instruction). The PowerISA provides the VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp) instruction. In the ABI this is vec_floato (vector double) . This instruction convert each double element then transfers converted element 0 to float element 1, and converted element 1 to float element 3. Float elements 0 and 2 are undefined (the hardware can do what ever). This does not match the expected results for _mm_cvtpd_ps. , 1.0, , 2.0} _mm_cvtpd_ps ({1.0, 2.0}) result = {1.0, 2.0, 0.0, 0.0}]]> So we need to re-position the results to word elements 0 and 2, which allows a pack operation to deliver the correct format. Here the merge odd splats element 1 to 0 and element 3 to 2. The Pack operation combines the low half of each doubleword from the vector result and vector of zeros to generate the require format. This technique is also used to implement _mm_cvttpd_epi32 which converts a packed vector double in to a packed vector int. The PowerISA instruction xvcvdpsxws uses a similar layout for the result as xvcvdpsp and requires the same fix up.