Programming-Guides/Intrinsics_Reference/ch_techniques.xml

<!--
  Copyright (c) 2019 OpenPOWER Foundation
  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
  
-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
  
  <!-- Chapter Title goes here. -->
  <title>Vector Programming Techniques</title>

  <section>
    <title>Help the Compiler Help You</title>
    <para>
      The best way to use vector intrinsics is often <emphasis>not to
      use them at all</emphasis>.
    </para>
    <para>
      This may seem counterintuitive at first.  Aren't vector
      intrinsics the best way to ensure that the compiler does exactly
      what you want?  Well, sometimes.  But the problem is that the
      best instruction sequence today may not be the best instruction
      sequence tomorrow.  As the Power ISA moves forward, new
      instruction capabilities appear, and the old code you wrote can
      easily become obsolete.  Then you start having to create
      different versions of the code for different levels of the
      Power ISA, and it can quickly become difficult to maintain.
    </para>
    <para>
      Most often programmers use vector intrinsics to increase the
      performance of loop kernels that dominate the performance of an
      application or library.  However, modern compilers are often
      able to optimize such loops to use vector instructions without
      having to resort to intrinsics, using an optimization called
      autovectorization (or auto-SIMD).  Your first focus when writing
      loop kernels should be on making the code amenable to
      autovectorization by the compiler.  Start by writing the code
      naturally, using scalar memory accesses and data operations, and
      see whether the compiler autovectorizes your code.  If not, here
      are some steps you can try:
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  <emphasis role="underline">Check your optimization
	  level</emphasis>.  Different compilers enable 
	  autovectorization at different optimization levels.  For
	  example, at this writing the GCC compiler requires
	  <code>-O3</code> to enable autovectorization by default.
	</para>
      </listitem>
      <listitem>
	<para>
	  <emphasis role="underline">Consider using
	  <code>-ffast-math</code></emphasis>.  This option assumes
	  that certain fussy aspects of IEEE floating-point can be
	  ignored, such as the presence of Not-a-Numbers (NaNs),
	  signed zeros, and so forth.  <code>-ffast-math</code> may
	  also affect precision of results that may not matter to your
	  application.  Turning on this option can simplify the
	  control flow of loops generated for your application by
	  removing tests for NaNs and so forth.  (Note that
	  <code>-Ofast</code> turns on both -O3 and -ffast-math in
	  GCC.)
	</para>
      </listitem>
      <listitem>
	<para>
	  <emphasis role="underline">Align your data wherever
	  possible</emphasis>.  For most effective auto-vectorization,
	  arrays of data should be aligned on at least a 16-byte
	  boundary, and pointers to that data should be identified as
	  having the appropriate alignment.  For example:
	</para>
	<programlisting>  float fdata[4096] __attribute__((aligned(16)));</programlisting>
	<para>
	  ensures that the compiler can use an efficient, aligned
	  vector load to bring data from <code>fdata</code> into a
	  vector register.  Autovectorization will appear more
	  profitable to the compiler when data is known to be
	  aligned.
	</para>
	<para>
	  You can also declare pointers to point to aligned data,
	  which is particularly useful in function arguments:
	</para>
	<programlisting>  void foo (__attribute__((aligned(16))) double * aligned_fptr)</programlisting>
      </listitem>
      <listitem>
	<para>
	  <emphasis role="underline">Tell the compiler when data can't
	  overlap</emphasis>.  In C and C++, use of pointers can cause
	  compilers to pessimistically analyze which memory references
	  can refer to the same memory.  This can prevent important
	  optimizations, such as reordering memory references, or
	  keeping previously loaded values in memory rather than
	  reloading them.  Inefficiently optimized scalar loops are
	  less likely to be autovectorized.  You can annotate your
	  pointers with the <code>restrict</code> or
	  <code>__restrict__</code> keyword to tell the compiler that
	  your pointers don't "alias" with any other memory
	  references.  (<code>restrict</code> can be used only in C
	  when compiling for the C99 standard or later.
	  <code>__restrict__</code> is a language extension, available
	  in both GCC and Clang, that can be used for both C and C++.)
	</para>
	<para>
	  Suppose you have a function that takes two pointer
	  arguments, one that points to data your function writes to, and
	  one that points to data your function reads from.  By
	  default, the compiler may believe that the data being read
	  and written could overlap.  To disabuse the compiler of this
	  notion, do the following:
	</para>
	<programlisting>  void foo (double *__restrict__ outp, double *__restrict__ inp)</programlisting>
      </listitem>
    </itemizedlist>
  </section>

  <section>
    <title>Use Portable Intrinsics</title>
    <para>
      If you can't convince the compiler to autovectorize your code,
      or you want to access specific processor features not
      appropriate for autovectorization, you should use intrinsics.
      However, you should go out of your way to use intrinsics that
      are as portable as possible, in case you need to change
      compilers in the future.
    </para>
    <para>
      This reference provides intrinsics that are guaranteed to be
      portable across compliant compilers.  In particular, both the
      GCC and Clang compilers for Power implement the intrinsics in
      this manual.  The compilers may each implement many more
      intrinsics, but the ones in this manual are the only ones
      guaranteed to be portable.  So if you are using an interface not
      described here, you should look for an equivalent one in this
      manual and change your code to use that.
    </para>
    <para>
      Where an intrinsic may not be available from all compilers or at
      all ISA levels, this information is called out in the
      description of the intrinsic in <xref
      linkend="VIPR.reference.vecfns" >.
    </para>
    <para>
      There are also other vector APIs that may be of use to you (see
      <xref linkend="VIPR.techniques.apis" />).  In particular, the
      Power Vector Library (see <xref
      linkend="VIPR.techniques.pveclib" />) provides additional
      portability across compiler versions, as well as interfaces that
      hide cases where assembly language is needed.
    </para>
  </section>

  <section>
    <title>Use Assembly Code Sparingly</title>
    <para>
      Sometimes the compiler will absolutely not cooperate in giving
      you the code you need.  You might not get the instruction you
      want, or you might get extra instructions that are slowing down
      your ideal performance.  When that happens, the first thing you
      should do is report this to the compiler community!  This will
      allow them to get the problem fixed in the next release of the
      compiler.
    </para>
    <para>
      In the meanwhile, though, what are your options?  As a
      workaround, your best option may be to use assembly code.  There
      are two ways to go about this.  Using inline assembly is
      generally appropriate only for very small snippets of code (1-5
      instructions, say).  If you want to write a whole function in
      assembly code, though, it is better to create a separate
      <code>.s</code> or <code>.S</code> file.  The only difference in
      these two file types is that a <code>.S</code> file will be
      processed by the C preprocessor before being assembled.
    </para>
    <para>
      Assembly programming is beyond the scope of this manual.
      Getting inline assembly correct can be quite tricky, and it is
      best to look at existing examples to learn how to use it
      properly.  However, there is a good introduction to inline
      assembly in <emphasis>Using the GNU Compiler
      Collection</emphasis> (see <xref linkend="VIPR.intro.links" />),
      in section 6.47 at the time of this writing.
    </para>
    <para>
      If you write a function entirely in assembly, you are
      responsible for following the calling conventions established by
      the ABI (see <xref linkend="VIPR.intro.links" />).  Again, it is
      best to look at examples.  One place to find well-written
      <code>.S</code> files is in the GLIBC project.
    </para>
  </section>

  <section xml:id="VIPR.techniques.apis">
    <title>Other Vector Programming APIs</title>
    <para>In addition to the intrinsic functions provided in this
    reference, programmers should be aware of other vector programming
    API resources.</para>
    <section>
      <title>x86 Vector Portability Headers</title>
      <para>
	Recent versions of the GCC and Clang open source compilers
	provide "drop-in" portability headers for portions of the
	Intel Architecture Instruction Set Extensions (see <xref
	linkend="VIPR.intro.links" />).  These headers mirror the APIs
	of Intel headers having the same names.  Support is provided
	for the MMX and SSE layers, up through SSE4.  At this time, no
	support for the AVX layers is envisioned.
      </para>
      <para>
	The portability headers provide the same semantics as the
	corresponding Intel APIs, but using VMX and VSX instructions
	to emulate the Intel vector instructions.  It should be
	emphasized that these headers are provided for portability,
	and will not necessarily perform optimally (although in many
	cases the performance is very good).  Using these headers is
	often a good first step in porting a library using Intel
	intrinsics to Power, after which more detailed rewriting of
	algorithms is usually desirable for best performance.
      </para>
      <para>
	Access to the portability APIs occurs automatically when
	including one of the corresponding Intel header files, such as
	<code>&lt;mmintrin.h&gt;</code>.
      </para>
    </section>
    <section xml:id="VIPR.techniques.pveclib">
      <title>The Power Vector Library (pveclib)</title>
      <para>The Power Vector Library, also known as
      <code>pveclib</code>, is a separate project available from
      github (see <xref linkend="VIPR.intro.links" />).  The
      <code>pveclib</code> project builds on top of the intrinsics
      described in this manual to provide higher-level vector
      interfaces that are highly portable.  The goals of the project
      include:
      </para>
      <itemizedlist>
	<listitem>
	  <para>
	    Providing equivalent functions across versions of the
	    Power ISA.  For example, the <emphasis>Vector
	    Multiply-by-10 Unsigned Quadword</emphasis> operation
	    introduced in Power ISA 3.0 (POWER9) can be implemented
	    using a few vector instructions on earlier Power ISA
	    versions. 
	  </para>
	</listitem>
	<listitem>
	  <para>
	    Providing equivalent functions across compiler versions.
	    For example, intrinsics provided in later versions of the
	    compiler can be implemented as inline functions with
	    inline asm in earlier compiler versions.
	  </para>
	</listitem>
	<listitem>
	  <para>
	    Providing higher-order functions not provided directly by
	    the Power ISA.  One example is a vector SIMD implementation
	    for ASCII <code>__isalpha</code> and similar functions.
	    Another example is full <code>__int128</code>
	    implementations of <emphasis>Count Leading
	    Zeroes</emphasis>, <emphasis>Population Count</emphasis>,
	    and <emphasis>Multiply</emphasis>.
	  </para>
	</listitem>
      </itemizedlist>
    </section>
  </section>

</chapter>
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`<!--`
			`Copyright (c) 2019 OpenPOWER Foundation`

			`Licensed under the Apache License, Version 2.0 (the "License");`
			`you may not use this file except in compliance with the License.`
			`You may obtain a copy of the License at`

			`http://www.apache.org/licenses/LICENSE-2.0`

			`Unless required by applicable law or agreed to in writing, software`
			`distributed under the License is distributed on an "AS IS" BASIS,`
			`WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`See the License for the specific language governing permissions and`
			`limitations under the License.`

			`-->`
			`<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"`
			`xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">`

			`<!-- Chapter Title goes here. -->`
			`<title>Vector Programming Techniques</title>`

			`<section>`
			`<title>Help the Compiler Help You</title>`
			`<para>`
Finish all the front matter! 5 years ago			`The best way to use vector intrinsics is often <emphasis>not to`
			`use them at all</emphasis>.`
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`</para>`
Finish all the front matter! 5 years ago			`<para>`
			`This may seem counterintuitive at first. Aren't vector`
			`intrinsics the best way to ensure that the compiler does exactly`
			`what you want? Well, sometimes. But the problem is that the`
			`best instruction sequence today may not be the best instruction`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`sequence tomorrow. As the Power ISA moves forward, new`
Finish all the front matter! 5 years ago			`instruction capabilities appear, and the old code you wrote can`
			`easily become obsolete. Then you start having to create`
			`different versions of the code for different levels of the`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`Power ISA, and it can quickly become difficult to maintain.`
Finish all the front matter! 5 years ago			`</para>`
			`<para>`
			`Most often programmers use vector intrinsics to increase the`
			`performance of loop kernels that dominate the performance of an`
			`application or library. However, modern compilers are often`
			`able to optimize such loops to use vector instructions without`
			`having to resort to intrinsics, using an optimization called`
			`autovectorization (or auto-SIMD). Your first focus when writing`
			`loop kernels should be on making the code amenable to`
			`autovectorization by the compiler. Start by writing the code`
			`naturally, using scalar memory accesses and data operations, and`
			`see whether the compiler autovectorizes your code. If not, here`
			`are some steps you can try:`
			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para>`
			`<emphasis role="underline">Check your optimization`
			`level</emphasis>. Different compilers enable`
			`autovectorization at different optimization levels. For`
			`example, at this writing the GCC compiler requires`
			`<code>-O3</code> to enable autovectorization by default.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<emphasis role="underline">Consider using`
			`<code>-ffast-math</code></emphasis>. This option assumes`
			`that certain fussy aspects of IEEE floating-point can be`
			`ignored, such as the presence of Not-a-Numbers (NaNs),`
			`signed zeros, and so forth. <code>-ffast-math</code> may`
			`also affect precision of results that may not matter to your`
			`application. Turning on this option can simplify the`
			`control flow of loops generated for your application by`
			`removing tests for NaNs and so forth. (Note that`
			`<code>-Ofast</code> turns on both -O3 and -ffast-math in`
			`GCC.)`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<emphasis role="underline">Align your data wherever`
			`possible</emphasis>. For most effective auto-vectorization,`
			`arrays of data should be aligned on at least a 16-byte`
			`boundary, and pointers to that data should be identified as`
			`having the appropriate alignment. For example:`
			`</para>`
			`<programlisting> float fdata[4096] __attribute__((aligned(16)));</programlisting>`
			`<para>`
			`ensures that the compiler can use an efficient, aligned`
			`vector load to bring data from <code>fdata</code> into a`
			`vector register. Autovectorization will appear more`
			`profitable to the compiler when data is known to be`
			`aligned.`
			`</para>`
			`<para>`
			`You can also declare pointers to point to aligned data,`
			`which is particularly useful in function arguments:`
			`</para>`
			`<programlisting> void foo (__attribute__((aligned(16))) double * aligned_fptr)</programlisting>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<emphasis role="underline">Tell the compiler when data can't`
			`overlap</emphasis>. In C and C++, use of pointers can cause`
			`compilers to pessimistically analyze which memory references`
			`can refer to the same memory. This can prevent important`
			`optimizations, such as reordering memory references, or`
			`keeping previously loaded values in memory rather than`
			`reloading them. Inefficiently optimized scalar loops are`
			`less likely to be autovectorized. You can annotate your`
			`pointers with the <code>restrict</code> or`
			`<code>__restrict__</code> keyword to tell the compiler that`
			`your pointers don't "alias" with any other memory`
			`references. (<code>restrict</code> can be used only in C`
			`when compiling for the C99 standard or later.`
			`<code>__restrict__</code> is a language extension, available`
			`in both GCC and Clang, that can be used for both C and C++.)`
			`</para>`
			`<para>`
			`Suppose you have a function that takes two pointer`
			`arguments, one that points to data your function writes to, and`
			`one that points to data your function reads from. By`
			`default, the compiler may believe that the data being read`
			`and written could overlap. To disabuse the compiler of this`
			`notion, do the following:`
			`</para>`
			`<programlisting> void foo (double __restrict__ outp, double __restrict__ inp)</programlisting>`
			`</listitem>`
			`</itemizedlist>`
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`</section>`

			`<section>`
			`<title>Use Portable Intrinsics</title>`
			`<para>`
Finish all the front matter! 5 years ago			`If you can't convince the compiler to autovectorize your code,`
			`or you want to access specific processor features not`
			`appropriate for autovectorization, you should use intrinsics.`
			`However, you should go out of your way to use intrinsics that`
			`are as portable as possible, in case you need to change`
			`compilers in the future.`
			`</para>`
			`<para>`
			`This reference provides intrinsics that are guaranteed to be`
			`portable across compliant compilers. In particular, both the`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`GCC and Clang compilers for Power implement the intrinsics in`
Finish all the front matter! 5 years ago			`this manual. The compilers may each implement many more`
			`intrinsics, but the ones in this manual are the only ones`
			`guaranteed to be portable. So if you are using an interface not`
			`described here, you should look for an equivalent one in this`
			`manual and change your code to use that.`
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`</para>`
Make updates for comments received so far, including issue #4 and issue #5. XL bug report support for Linux is still pending. 5 years ago			`<para>`
			`Where an intrinsic may not be available from all compilers or at`
			`all ISA levels, this information is called out in the`
			`description of the intrinsic in <xref`
			`linkend="VIPR.reference.vecfns" >.`
			`</para>`
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`<para>`
Finish all the front matter! 5 years ago			`There are also other vector APIs that may be of use to you (see`
			`<xref linkend="VIPR.techniques.apis" />). In particular, the`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`Power Vector Library (see <xref`
Finish all the front matter! 5 years ago			`linkend="VIPR.techniques.pveclib" />) provides additional`
Make updates for comments received so far, including issue #4 and issue #5. XL bug report support for Linux is still pending. 5 years ago			`portability across compiler versions, as well as interfaces that`
			`hide cases where assembly language is needed.`
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`</para>`
			`</section>`

			`<section>`
			`<title>Use Assembly Code Sparingly</title>`
Finish all the front matter! 5 years ago			`<para>`
			`Sometimes the compiler will absolutely not cooperate in giving`
			`you the code you need. You might not get the instruction you`
			`want, or you might get extra instructions that are slowing down`
			`your ideal performance. When that happens, the first thing you`
			`should do is report this to the compiler community! This will`
			`allow them to get the problem fixed in the next release of the`
			`compiler.`
			`</para>`
			`<para>`
			`In the meanwhile, though, what are your options? As a`
			`workaround, your best option may be to use assembly code. There`
			`are two ways to go about this. Using inline assembly is`
			`generally appropriate only for very small snippets of code (1-5`
			`instructions, say). If you want to write a whole function in`
			`assembly code, though, it is better to create a separate`
			`<code>.s</code> or <code>.S</code> file. The only difference in`
			`these two file types is that a <code>.S</code> file will be`
			`processed by the C preprocessor before being assembled.`
			`</para>`
			`<para>`
			`Assembly programming is beyond the scope of this manual.`
			`Getting inline assembly correct can be quite tricky, and it is`
			`best to look at existing examples to learn how to use it`
			`properly. However, there is a good introduction to inline`
			`assembly in <emphasis>Using the GNU Compiler`
			`Collection</emphasis> (see <xref linkend="VIPR.intro.links" />),`
			`in section 6.47 at the time of this writing.`
			`</para>`
			`<para>`
			`If you write a function entirely in assembly, you are`
			`responsible for following the calling conventions established by`
			`the ABI (see <xref linkend="VIPR.intro.links" />). Again, it is`
			`best to look at examples. One place to find well-written`
			`<code>.S</code> files is in the GLIBC project.`
			`</para>`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`</section>`

Finish all the front matter! 5 years ago			`<section xml:id="VIPR.techniques.apis">`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`<title>Other Vector Programming APIs</title>`
			`<para>In addition to the intrinsic functions provided in this`
			`reference, programmers should be aware of other vector programming`
			`API resources.</para>`
			`<section>`
			`<title>x86 Vector Portability Headers</title>`
			`<para>`
Finish all the front matter! 5 years ago			`Recent versions of the GCC and Clang open source compilers`
			`provide "drop-in" portability headers for portions of the`
			`Intel Architecture Instruction Set Extensions (see <xref`
			`linkend="VIPR.intro.links" />). These headers mirror the APIs`
			`of Intel headers having the same names. Support is provided`
			`for the MMX and SSE layers, up through SSE4. At this time, no`
			`support for the AVX layers is envisioned.`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`</para>`
			`<para>`
			`The portability headers provide the same semantics as the`
			`corresponding Intel APIs, but using VMX and VSX instructions`
			`to emulate the Intel vector instructions. It should be`
			`emphasized that these headers are provided for portability,`
			`and will not necessarily perform optimally (although in many`
			`cases the performance is very good). Using these headers is`
			`often a good first step in porting a library using Intel`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`intrinsics to Power, after which more detailed rewriting of`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`algorithms is usually desirable for best performance.`
			`</para>`
			`<para>`
			`Access to the portability APIs occurs automatically when`
			`including one of the corresponding Intel header files, such as`
			`<code><mmintrin.h></code>.`
			`</para>`
			`</section>`
Finish all the front matter! 5 years ago			`<section xml:id="VIPR.techniques.pveclib">`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`<title>The Power Vector Library (pveclib)</title>`
			`<para>The Power Vector Library, also known as`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`<code>pveclib</code>, is a separate project available from`
			`github (see <xref linkend="VIPR.intro.links" />). The`
			`<code>pveclib</code> project builds on top of the intrinsics`
			`described in this manual to provide higher-level vector`
			`interfaces that are highly portable. The goals of the project`
			`include:`
			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para>`
			`Providing equivalent functions across versions of the`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`Power ISA. For example, the <emphasis>Vector`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`Multiply-by-10 Unsigned Quadword</emphasis> operation`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`introduced in Power ISA 3.0 (POWER9) can be implemented`
			`using a few vector instructions on earlier Power ISA`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`versions.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`Providing equivalent functions across compiler versions.`
			`For example, intrinsics provided in later versions of the`
			`compiler can be implemented as inline functions with`
			`inline asm in earlier compiler versions.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`Providing higher-order functions not provided directly by`
Changed to consistently use Power versus POWER, Power ISA versus PowerISA, etc. Added graphic to vec_gb. 5 years ago			`the Power ISA. One example is a vector SIMD implementation`
Significant updates to chapters 1-3. Delete old outline file. 5 years ago			`for ASCII <code>__isalpha</code> and similar functions.`
			`Another example is full <code>__int128</code>`
			`implementations of <emphasis>Count Leading`
			`Zeroes</emphasis>, <emphasis>Population Count</emphasis>,`
			`and <emphasis>Multiply</emphasis>.`
			`</para>`
			`</listitem>`
			`</itemizedlist>`
			`</section>`
Create outline for front matter chapters. Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com> 5 years ago			`</section>`

			`</chapter>`