Vector Programming Techniques
Help the Compiler Help You Start with scalar code, which is the most portable. Use various tricks for helping the compiler vectorize scalar code. Make sure you align your data on 16-byte boundaries wherever possible, and tell the compiler it's aligned. Use __restrict__ pointers to promise data does not alias.
Use Portable Intrinsics Individual compilers may provide other intrinsic support. Only the intrinsics in this manual are guaranteed to be portable across compliant compilers. Some compilers may provide compatibility headers for use with other architectures. Recent GCC and Clang compilers support compatibility headers for the lower levels of the x86 vector architecture. These can be used initially for ease of porting, but for best performance, it is preferable to rewrite important sections of code with native Power intrinsics.
Use Assembly Code Sparingly filler