From c7599451a7384e00f4961415ee2a78a9c8cef824 Mon Sep 17 00:00:00 2001 From: Jeff Scheel Date: Fri, 23 Sep 2016 12:19:52 -0500 Subject: [PATCH] Preliminary documentation review updates - Issue #2: Intel and Itanium trademark notices returned to ch_preface.xml. - Issue #3: "About this Document" section returned to ch_preface.xml. - Issue #4: Hyperlinks corrected in ch_1.xml. - Issue #5: Removed "section" tags to promote Introduction to main chapter text in ch_1.xml. - Issue #7: Corrected Table 2.10 borders in ch_2.xml. - Issue #10: Manually added spacing prior to all bridehead documentation to increase visible groupings in ch_2.xml and other sections. - Issue #11: Corrected bullet nesting in ch_2.xml. Additional, non-issue updates throughout document: - Manual review and update of table sizes, spacing - Updates to program listings to ensure proper spacing. - Change bars for POWER9 and erratum work added. Signed-off-by: Jeff Scheel --- specification/app_a.xml | 2229 +++++--------------------------- specification/app_b.xml | 90 +- specification/app_glossary.xml | 74 +- specification/bk_main.xml | 37 +- specification/ch_1.xml | 124 +- specification/ch_2.xml | 1879 ++++++++++++++++++--------- specification/ch_3.xml | 285 ++-- specification/ch_4.xml | 324 +++-- specification/ch_5.xml | 30 +- specification/ch_6.xml | 103 +- specification/ch_preface.xml | 57 +- specification/pom.xml | 4 +- 12 files changed, 2126 insertions(+), 3110 deletions(-) diff --git a/specification/app_a.xml b/specification/app_a.xml index 06f022a..6c6b5a1 100644 --- a/specification/app_a.xml +++ b/specification/app_a.xml @@ -23,11 +23,11 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> supported range leads to implementation-defined behavior. It is recommended that compilers generate a warning or error for out-of-range literals. - Vectors may be constructed from scalar values with a vector + Vectors may be constructed from scalar values with a vector constructor. For example: (vector type){e1, e2, ..., e n}. The values specified for each vector element can be either a compile-time constant or a runtime expression. - Floating-point vector built-in operators are controlled by the + Floating-point vector built-in operators are controlled by the rounding mode set for floating-point operations unless otherwise specified.
@@ -309,47 +309,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_abs (vector float); - - - VEC_ABSD (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Computes the absolute difference. - Result value: - Each element of the result contains the absolute difference - of the corresponding input elements using modulo - arithmetic. - - - - - POWER ISA 3.0 - - - vector unsigned char vec_absd (vector unsigned char, vector - unsigned char); - - - - - POWER ISA 3.0 - - - vector unsigned int vec_absd (vector unsigned int, vector - unsigned int); - - - - - POWER ISA 3.0 - - - vector unsigned short vec_absd (vector unsigned short, - vector unsigned short); - - VEC_ABSS (ARG1) @@ -800,14 +759,15 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - Phased in. + Phased in. This optional function is being phased in, and it might not be - available on all implementations. Phased-in interfaces are optional - for the current generation of compliant systems. + available on all implementations. + Phased-in interfaces are optional + for the current generation of compliant systems. - + vector bool long long vec_and (vector bool long long, vector bool long long) @@ -1095,11 +1055,11 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Purpose: - Gathers up to 16 1-bit values from a quadword or from each + Gathers up to 16 1-bit values from a quadword or from each doubleword element in the specified order, zeroing other bits. Result value: - When the type of ARG1 is vector unsigned __int128: + When the type of ARG1 is vector unsigned __int128: For each i (0 ≤ i < 16), let bit index j denote the @@ -1117,9 +1077,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> All other bits are zeroed. - When the type of ARG1 is vector unsigned char or vector + When the type of ARG1 is vector unsigned char or vector unsigned long long: - + For each doubleword element i (0 ≤ i < 2) of ARG1, regardless of the input operand type specified for @@ -1144,15 +1104,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - - - POWER ISA 3.0 - - - vector unsigned char vec_bperm (vector unsigned char, - vector unsigned char); - - @@ -1162,7 +1113,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> __int128, vector unsigned char); - + @@ -1260,7 +1211,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Otherwise, the value of each bit is 0. - + @@ -1287,7 +1238,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned char); - + @@ -1314,7 +1265,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -1341,7 +1292,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned long long); - + @@ -1825,7 +1776,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Otherwise, the value of each bit is 0. - + @@ -1852,7 +1803,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned char); - + @@ -1879,7 +1830,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -1906,7 +1857,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned long long); - + @@ -1951,77 +1902,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> float); - - - VEC_CMPNEZ (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Returns a vector containing the results of comparing each - set of corresponding elements of the given vectors for inequality - or for an element with a 0 value. - Result value: - For each element of the result, the value of each bit is 1 - if the corresponding elements of ARG1 and - ARG2 are not equal, or if the ARG1 element or the ARG2 - element is 0. Otherwise, the value of each bit is 0. - - - - - POWER ISA 3.0 - - - vector bool char vec_cmpnez (vector signed char, vector - signed char); - - - - - POWER ISA 3.0 - - - vector bool char vec_cmpnez (vector unsigned char, vector - unsigned char); - - - - - POWER ISA 3.0 - - - vector bool int vec_cmpnez (vector signed int, vector - signed int); - - - - - POWER ISA 3.0 - - - vector bool int vec_cmpnez (vector unsigned int, vector - unsigned int); - - - - - POWER ISA 3.0 - - - vector bool short vec_cmpnez (vector signed short, vector - signed short); - - - - - POWER ISA 3.0 - - - vector bool short vec_cmpnez (vector unsigned short, vector - unsigned short); - - VEC_CNTLZ (ARG1) @@ -2121,154 +2001,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> short); - - - VEC_CNTLZ_LSBB (ARG1) - POWER ISA 3.0 - - - Purpose: - Returns the number of leading byte elements (starting at - the lowest-numbered element) of a vector that have a - least-significant bit of 0. - Result value: - The number of leading byte elements (starting at the - lowest-numbered element) of a vector that have a - least-significant bit of 0. - - - - - POWER ISA 3.0 - - - signed int vec_cntlz_lsbb (vector signed char); - - - - - POWER ISA 3.0 - - - signed int vec_cntlz_lsbb (vector unsigned char); - - - - - VEC_CNTTZ (ARG1) - POWER ISA 3.0 - - - Purpose: - Returns a vector containing the number of least-significant - bits equal to 0 of each corresponding element of the given - vector. - Result value: - The value of each element of the result is set to the - number of trailing zeros of the corresponding element of - ARG1. - - - - - POWER ISA 3.0 - - - vector signed char vec_cnttz (vector signed char); - - - - - POWER ISA 3.0 - - - vector unsigned char vec_cnttz (vector unsigned - char); - - - - - POWER ISA 3.0 - - - vector signed int vec_cnttz (vector signed int); - - - - - POWER ISA 3.0 - - - vector unsigned int vec_cnttz (vector unsigned int); - - - - - POWER ISA 3.0 - - - vector signed long long vec_cnttz (vector signed long - long); - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_cnttz (vector unsigned long - long); - - - - - POWER ISA 3.0 - - - vector signed short vec_cnttz (vector signed short); - - - - - POWER ISA 3.0 - - - vector unsigned short vec_cnttz (vector unsigned - short); - - - - - VEC_CNTTZ_LSBB (ARG1) - POWER ISA 3.0 - - - Purpose: - Returns the number of trailing byte elements (starting at - the highest-numbered element) of a vector that have a - least-significant bit of 0. - Result value: - The number of trailing byte elements (starting at the - highest-numbered element) of a vector that have a - least-significant bit of 0. - - - - - POWER ISA 3.0 - - - signed int vec_cnttz_lsbb (vector signed char); - - - - - POWER ISA 3.0 - - - signed int vec_cnttz_lsbb (vector unsigned char); - - VEC_CPSGN(ARG1, ARG2) @@ -2315,7 +2047,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector. Result value: The value of each element of the result is the closest - floating-point approximation of the value of the corresponding + floating-point + approximationestimate + of the value of the corresponding element of ARG1 divided by 2 to the power of ARG2, which should be in the range 0 - 31. @@ -2346,7 +2080,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts a real vector into a vector signed int. Result value: The value of each element of the result is the saturated - signed-integer value, truncated towards zero, obtained by + signed-integer value, + truncated towards zero, obtained by multiplying the corresponding element of ARG1 by 2 to the power of ARG2, which should be in the range 0 - 31. @@ -2368,7 +2103,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts a real vector into a vector unsigned int. Result value: The value of each element of the result is the saturated - unsigned-integer value, truncated towards zero, obtained by + unsigned-integer value, + truncated towards zero, obtained by multiplying the corresponding element of ARG1 by 2 to the power of ARG2, which should be in the range 0 - 31. @@ -2420,10 +2156,10 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> VEC_DOUBLE (ARG1) - Purpose: + Purpose: Converts a vector of long integers into a vector of double-precision numbers. - Result value: + Result value: Target elements are computed by converting the respective input elements. @@ -2488,8 +2224,10 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Purpose: - Converts an input vector to a vector of double-precision - floating-point numbers. + Converts + an input vectora vector of integers + to a vector of double-precision + floating-point numbers. Result value: Target elements 0 and 1 are set to the converted values of source elements 0 and 1. @@ -2525,8 +2263,10 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Purpose: - Converts an input vector to a vector of double-precision - floating-point numbers. + Converts + an input vectora vector of integers + to a vector of double-precision + floating-point numbers. Result value: Target elements 0 and 1 are set to the converted values of source elements 2 and 3. @@ -2645,7 +2385,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - vector signed int vec_eqv (vector signed int, vector signed + vector signed int vec_eqv (vector signed int, vector + signed int); @@ -2700,7 +2441,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector signed short vec_eqv (vector signed short, vector - signed short); + signed short); @@ -2892,534 +2633,93 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - POWER ISA 3.0 - Phased in. - - - - - _Float16 vec_extract (vector _Float16, signed int); - - - - - VEC_EXTRACT_EXP (ARG1) - POWER ISA 3.0 + VEC_FLOAT (ARG1) Purpose: - Extracts an exponent from a floating-point number. + Converts a vector of integers to a vector of + single-precision floating-point numbers. Result value: - Each element of the returned integer vector is extracted - from the exponent field of the corresponding floating-point - vector element. - The extracted exponent of ARG1 is returned as a - right-justified unsigned integer containing a biased exponent, in - accordance with the exponent representation specified by IEEE - 754, without further processing. + Target elements are obtained by converting the respective + source elements to unsigned integers. - POWER ISA 3.0 + - vector unsigned long long vec_extract_exp (vector - double); + vector float + vec_float (vector signed int); - POWER ISA 3.0 + - vector unsigned int vec_extract_exp (vector float); + vector float + vec_float (vector unsigned int); - VEC_EXTRACT_FP32_FROM_ - SHORTH (ARG1) - POWER ISA 3.0 + VEC_FLOAT2 (ARG1, ARG2) Purpose: - Extracts four single-precision floating-point numbers from - the high elements of a vector of eight 16-bit elements, - interpreting each element as a 16-bit floating-point number in - IEEE format. + Converts + an input vectora vector of integers + to a vector of single-precision + numbers floating-point numbers. Result value: - The first four elements are interpreted as 16-bit - floating-point numbers in IEEE format, and extended to - single-precision format, returning a vector with four - single-precision IEEE numbers. + Target elements are obtained by converting the source + elements to single-precision numbers as follows: + + + Target elements 0 and 1 from source 0 + + + Target elements 2 and 3 from source 1 + + - POWER ISA 3.0 + - vector float vec_extract_fp32_from_shorth (vector unsigned - short); + vector float vec_float2 (vector signed long long, vector + signed long long); - VEC_EXTRACT_FP32_FROM_ - SHORTL (ARG1) - POWER ISA 3.0 + - Purpose - Extracts four single-precision floating-point numbers from - the low elements of a vector of eight 16-bit elements, - interpreting each element as a 16-bit floating-point number in - IEEE format. - Result value: - The last four elements are interpreted as 16-bit - floating-point numbers in IEEE format, and extended to - single-precision format, returning a vector with four - single-precision IEEE numbers. + vector float vec_float2 (vector unsigned long long, vector + unsigned long long); - POWER ISA 3.0 + - vector float vec_extract_fp32_from_shortl (vector unsigned - short); + vector float vec_float2 (vector double, vector + double); - VEC_EXTRACT_SIG (ARG1) - POWER ISA 3.0 + VEC_FLOATE (ARG2) Purpose: - Extracts a significand (mantissa) from a floating-point - number. - Result value: - Each element of the returned integer vector is extracted - from the significand (mantissa) field of the corresponding - floating-point vector element. - The significand is from the corresponding floating-point - number in accordance with the IEEE format. The returned result - includes the implicit leading digit. The value of that digit is - not encoded in the IEEE format, but is implied by the - exponent. - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_extract_sig (vector - double) - - - - - POWER ISA 3.0 - - - vector unsigned int vec_extract_sig (vector float) - - - - - VEC_EXTRACT4B (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Extracts a word from a vector at a byte position. - Result value: - The first doubleword element of the result contains the - zero-extended extracted word from ARG1. The second doubleword is - set to 0. ARG2 specifies the least-significant byte number (0 - - 12) of the word to be extracted. - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_extract4b (vector unsigned - char, const int) - - - - - VEC_FIRST_MATCH_INDEX (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Performs a comparison of equality on each of the - corresponding elements of ARG1 and ARG2, and returns the first - position of equality. - Result value: - Returns the element index of the position of the first - character match. If no match, returns the number of characters as - an element count in the vector argument. - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_index (vector signed char, - vector signed char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_index (vector unsigned char, - vector unsigned char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_index (vector signed int, - vector signed int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_index (vector unsigned int, - vector unsigned int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_index (vector signed short, - vector signed short); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_index (vector unsigned short, - vector unsigned short); - - - - - VEC_FIRST_MATCH_OR_EOS_ INDEX (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Performs a comparison of equality on each of the - corresponding elements of ARG1 and ARG2. Returns the first - position of equality, or the zero string terminator. - Result value: - Returns the element index of the position of either the - first character match or an end-of-string (EOS) terminator. If no - match or terminator, returns the number of characters as an - element count in the vector argument. - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_or_eos_index (vector signed - char, vector signed char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_or_eos_index (vector unsigned - char, vector unsigned char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_or_eos_index (vector signed - int, vector signed int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_or_eos_index (vector unsigned - int, vector unsigned int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_or_eos_index (vector signed - short, vector signed short); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_match_or_eos_index (vector unsigned - short, vector unsigned short); - - - - - VEC_FIRST_MISMATCH_INDEX(ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Performs a comparison of inequality on each of the - corresponding elements of ARG1 and ARG2, and returns the first - position of inequality. - Result value: - Returns the element index of the position of the first - character mismatch. If no mismatch, returns the number of - characters as an element count in the vector argument. - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_index (vector signed char, - vector signed char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_index (vector unsigned - char, vector unsigned char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_index (vector signed int, - vector signed int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_index (vector unsigned int, - vector unsigned int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_index (vector signed short, - vector signed short); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_index (vector unsigned - short, vector unsigned short); - - - - - VEC_FIRST_MISMATCH_OR_ EOS_INDEX (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Performs a comparison of inequality on each of the - corresponding elements of ARG1 and ARG2. Returns the first - position of inequality, or the zero string terminator. - Result value: - Returns the element index of the position of either the - first character mismatch or an end-of-string (EOS) terminator. If - no mismatch or terminator, returns the number of characters as an - element count in the vector argument. - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_or_eos_index (vector signed - char, vector signed char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_or_eos_index (vector - unsigned char, vector unsigned char); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_or_eos_index (vector signed - int, vector signed int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_or_eos_index (vector - unsigned int, vector unsigned int); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_or_eos_index (vector signed - short, vector signed short); - - - - - POWER ISA 3.0 - - - unsigned int vec_first_mismatch_or_eos_index (vector - unsigned short, vector unsigned short); - - - - - VEC_FLOAT (ARG1) - - - Purpose: - Converts a vector of integers to a vector of - single-precision floating-point numbers. - Result value: - Target elements are obtained by converting the respective - source elements to unsigned integers. - - - - - - - - vector float vec_float (vector signed int); - - - - - - - - vector float vec_float (vector unsigned int); - - - - - VEC_FLOAT2 (ARG1, ARG2) - - - Purpose: - Converts an input vector to a vector of single-precision - numbers floating-point numbers. - Result value: - Target elements are obtained by converting the source - elements to single-precision numbers as follows: - - - Target elements 0 and 1 from source 0 - - - Target elements 2 and 3 from source 1 - - - - - - - - - - vector float vec_float2 (vector signed long long, vector - signed long long); - - - - - - - - vector float vec_float2 (vector unsigned long long, vector - unsigned long long); - - - - - - - - vector float vec_float2 (vector double, vector - double); - - - - - VEC_FLOATE (ARG2) - - - Purpose: - Converts an input vector to a vector of single-precision - numbers. + Converts an input vector to a vector of single-precision + numbers. Result value: The even-numbered target elements are obtained by converting the source elements to single-precision @@ -3450,56 +2750,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_floate (vector double); - - - VEC_FLOATH (ARG2) - POWER ISA 3.0 - Phased in. - - - - - Purpose: - Converts a vector to a vector of single-precision - floating-point numbers. - Result value: - Target elements 0 through 3 are set to the converted values - of source elements 0 through 3, respectively. - - - - - - - - vector float vec_floath (vector _Float16); - - - - - VEC_FLOATL (ARG2) - POWER ISA 3.0 - Phased in. - - - - - Purpose: - Converts a vector to a vector of single-precision - floating-point numbers. - Result value: - Target elements 0 through 3 are set to the converted values - of source elements 4 through 7, respectively. - - - - - - - - vector float vec_floatl (vector _Float16); - - VEC_FLOATO (ARG2) @@ -3649,154 +2899,54 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - vector signed long long vec_insert (signed long long, - vector signed long long, signed int); - - - - - - - - vector unsigned long long vec_insert (unsigned long long, - vector unsigned long long, signed int); - - - - - - - - vector signed short vec_insert (signed short, vector signed - short, signed int); - - - - - - - - vector unsigned short vec_insert (unsigned short, vector - unsigned short, signed int); - - - - - - - - vector double vec_insert (double, vector double, signed - int); - - - - - - - - vector float vec_insert (float, vector float, signed - int); - - - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_insert (_Float16, vector _Float16, - signed int); - - - - - VEC_INSERT_EXP (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Inserts an exponent into a floating-point number. - Result value: - Each element of the returned floating-point vector is - generated by combining the exponent specified by the - corresponding element of ARG2 with the sign and significand of - the corresponding element of ARG1. - The inserted exponent of ARG2 is treated as a - right-justified unsigned integer containing a biased exponent, in - accordance with the exponent representation specified by IEEE - 754. It is combined with the sign and significand of ARG1 without - further processing. - - - - - POWER ISA 3.0 - - - vector double vec_insert_exp (vector double, vector - unsigned long long); - - - - - POWER ISA 3.0 - - - vector double vec_insert_exp (vector unsigned long long, - vector unsigned long long); + vector signed long long vec_insert (signed long long, + vector signed long long, signed int); - POWER ISA 3.0 + - vector float vec_insert_exp (vector float, vector unsigned - int); + vector unsigned long long vec_insert (unsigned long long, + vector unsigned long long, signed int); - POWER ISA 3.0 + - vector float vec_insert_exp (vector unsigned int, vector - unsigned int); + vector signed short vec_insert (signed short, vector signed + short,. + signed int); - VEC_INSERT4B (ARG1, ARG2, ARG3) - POWER ISA 3.0 + - Purpose: - Inserts a word into a vector at a byte position. - Result value: - The first doubleword element of the result contains the - zero-extended extracted word from ARG1. The second doubleword is - set to 0. ARG2 specifies the least-significant byte (0 - 12) of - the extracted word. + vector unsigned short vec_insert (unsigned short, vector + unsigned short, signed int); - POWER ISA 3.0 + - vector unsigned char vec_insert4b (vector signed int, - vector unsigned char, const int) + vector double vec_insert (double, vector double, signed + int); - POWER ISA 3.0 + - vector unsigned char vec_insert4b (vector unsigned int, - vector unsigned char, const int) + vector float vec_insert (float, vector float, signed + int); @@ -4275,18 +3425,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> float); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_mergeh (vector _Float16, vector - _Float16); - - VEC_MERGEL (ARG1, ARG2) @@ -4442,18 +3580,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> float); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_mergel (vector _Float16, vector - _Float16); - - VEC_MERGEO (ARG1, ARG2) @@ -5681,6 +4807,22 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); + + + + + + vector double vec_or (vector bool long long, vector double); + + + + + + + + vector double vec_or (vector double, vector bool long long); + + @@ -5689,6 +4831,22 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector double vec_or (vector double, vector double); + + + + + + vector float vec_or (vector bool int, vector float); + + + + + + + + vector float vec_or (vector float, vector bool int); + + @@ -5844,10 +5002,11 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Packs information from each element of two vectors into the result vector. Result value: - For integer types, the value of each element of the result + For integer types, theThe + value of each element of the result vector is taken from the low-order half of the corresponding element of the result of concatenating ARG1 and ARG2. - For floating-point types, the value of each element of the + For floating-point types, the value of each element of the result vector is the corresponding element of the result of concatenating ARG1 and ARG2, rounded to the result type. @@ -5942,44 +5101,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> double); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_pack (vector float, vector - float); - - - - - VEC_PACK_TO_SHORT_FP32 (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Packs eight single-precision 32-bit floating-point numbers - into a vector of eight 16-bit floating-point numbers. - Result value: - The value is a vector consisting of eight 16-bit elements, - each representing a 16-bit floating-point number that was created - by converting the corresponding single-precision value to - half-precision. - - - - - POWER ISA 3.0 - - - vector unsigned short vec_pack_to_short_fp32 (vector float, - vector float); - - - VEC_PACKPX (ARG1, ARG2) @@ -6152,74 +5273,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned int); - - - VEC_PARITY_LSBB (ARG1) - POWER ISA 3.0 - - - Purpose: - Compute parity on the least-significant bit of each - byte. - Result value: - Returns a vector with each element containing the parity of - the low-order bit of each of the bytes in that element. - - - - - POWER ISA 3.0 - - - vector unsigned int vec_parity_lsbb (vector signed - int); - - - - - POWER ISA 3.0 - - - vector unsigned int vec_parity_lsbb (vector unsigned - int); - - - - - POWER ISA 3.0 - - - vector unsigned __int128 vec_parity_lsbb (vector - signed__int128); - - - - - POWER ISA 3.0 - - - vector unsigned __int128 vec_parity_lsbb (vector - unsigned__int128); - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_parity_lsbb (vector signed - long long); - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_parity_lsbb (vector unsigned - long long); - - VEC_PERM (ARG1, ARG2, ARG3) @@ -6373,18 +5426,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned char); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_perm (vector _Float16, vector _Float16, - vector unsigned char); - - VEC_PERMXOR (ARG1, ARG2, ARG3) @@ -6736,17 +5777,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_revb (vector float); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_revb (vector _Float16); - - VEC_REVE (ARG1) @@ -6876,17 +5906,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_reve (vector float); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_reve (vector _Float16); - - VEC_RINT (ARG1) @@ -7007,78 +6026,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - - - VEC_RLMI (ARG1, ARG2, ARG3) - POWER ISA 3.0 - - - Purpose: - Rotates each element of a vector left and inserts each - element under a mask. - Result value: - The result is obtained by rotating each element of vector - ARG1 left and inserting it under mask into ARG2. ARG3 bits 11:15 - contain the mask beginning, bits 19:23 contain the mask end, and - bits 27:31 contain the shift count. - - - - - POWER ISA 3.0 - - - vector unsigned int vec_rlmi (vector unsigned int, vector - unsigned int, vector unsigned int); - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_rlmi (vector unsigned long - long, vector unsigned long long, vector unsigned long - long); - - - - - VEC_RLNM (ARG1, ARG2, ARG3) - POWER ISA 3.0 - - - Purpose: - Rotates each element of a vector left; then intersects - (AND) it with a mask. - Result value: - Each element of vector ARG1 is rotated left; then - intersected (AND) with a mask specified by ARG3. - ARG3 contains the mask begin, mask end, and shift count for - each element. The shift count is in the low-order byte, the mask - end is in the next higher byte, and the mask begin is in the next - higher byte. - - - - - POWER ISA 3.0 - - - vector unsigned int vec_rlnm (vector unsigned int, vector - unsigned int, vector unsigned int); - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_rlnm (vector unsigned long - long, vector unsigned long long, vector unsigned long - long); - - VEC_ROUND (ARG1) @@ -7333,7 +6280,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector signed long long, vector bool long long); - + Phased in. @@ -7355,7 +6302,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long, vector unsigned long long, vector bool long long); - + Phased in. @@ -7448,30 +6395,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_sel (vector _Float16, vector _Float16, - vector bool short); - - - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_sel (vector _Float16, vector _Float16, - vector unsigned short); - - VEC_SIGNED (ARG1) @@ -7481,7 +6404,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts a vector of floating-point numbers to a vector of signed integers. Result value: - Target elements are obtained by truncating the respective + Target elements are obtained by + truncatingconverting + the respective source elements to signed integers. @@ -7510,7 +6435,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts a vector of floating-point numbers to vector of signed integers. Result value: - Target elements are obtained by truncating the source + Target elements are obtained by + truncatingconverting + the source elements to the signed integers as follows: @@ -7540,7 +6467,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts an input vector to a vector of signed integers. Result value: - The even target elements are obtained by truncating the + The even target elements are obtained by + truncatingconverting + the source elements to signed integers as follows: Target elements 0 and 2 contain the converted values of the input vector. @@ -7563,7 +6492,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts an input vector to a vector of signed integers. Result value: - The odd target elements are obtained by truncating the + The odd target elements are obtained by + truncatingconverting + the source elements to signed integers as follows: Target elements 1 and 3 contain the converted values of the input vector. @@ -7670,7 +6601,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Purpose: - Left shifts a double vector (that is, two concatenated + Left shifts a double vector (that is, two concatenated vectors) by a given number of bytes. For vec_sld being performed on the vector bool and floating-point types, the result is undefined, when the specified shift count is not a multiple of @@ -7805,7 +6736,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned short, const int); - + Phased in. @@ -7922,7 +6853,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Purpose: Left shifts a vector by a given number of bits. Result value: - The result is the contents of ARG1, shifted left by the + The result is the contents of ARG1, shifted left by the number of bits specified by the three least-significant bits of ARG2. The bits that are shifted out are replaced by zeros. The shift count must have been replicated into all bytes of the shift @@ -7965,7 +6896,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned char); - + Phased in. @@ -7976,7 +6907,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned char); - + Phased in. @@ -8199,51 +7130,17 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - vector float vec_slo (vector float, vector signed - char); - - - - - - - - vector float vec_slo (vector float, vector unsigned - char); - - - - - VEC_SLV (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Left-shifts a vector by a varying number of bits by - element. - Result value: - For each integer 0 - i - 14, let X - i be the halfword formed by concatenating - elements i and i+1 of ARG1. Let X - 15 be the halfword formed by concatenating - element 15 of ARG1 with a zero byte. Let S - i be the value in the three least-significant - bits of element i of ARG2. Then, element i of the result vector - contains the value formed from bits S - i through S - i+ 7 of X - i. + vector float vec_slo (vector float, vector signed + char); - POWER ISA 3.0 + - vector unsigned char vec_slv (vector unsigned char, vector - unsigned char); + vector float vec_slo (vector float, vector unsigned + char); @@ -8397,18 +7294,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_splat (vector float, const int); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_splat (vector _Float16, const - int); - - VEC_SPLAT_S8 (ARG1) @@ -8645,17 +7530,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_splats (float); - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_splats (_Float16); - - VEC_SQRT (ARG1) @@ -8872,9 +7746,10 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Result value: The result is the contents of ARG1, shifted right by the number of bits specified by the 3 least-significant bits of ARG2. - The bits that are shifted out are replaced by zeros. The shift + The bits that are shifted out are replaced by zeros. + The shift count must have been replicated into all bytes of the shift count - specification. + specification. @@ -8913,7 +7788,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned char); - + Phased in. @@ -8924,7 +7799,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned char); - + Phased in. @@ -9164,39 +8039,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> char); - - - VEC_SRV (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Right-shifts a vector by a varying number of bits by - element. - Result value: - For each integer 1 - i - 15, let X - i be the halfword formed by concatenating - elements i and i+1 of ARG1. Let X - 0 be the halfword formed by concatenating a - zero byte with element 0 of ARG1. Let S - i be the value in the three least-significant - bits of element i of ARG2. Then element i of the result vector - contains the value formed from bits 8 - S - i through 15 - S - i. - - - - - POWER ISA 3.0 - - - vector unsigned char vec_srv (vector unsigned char, vector - unsigned char); - - VEC_SUB (ARG1, ARG2) @@ -9640,41 +8482,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> signed int); - - - VEC_TEST_DATA_CLASS (ARG1, ARG2) - POWER ISA 3.0 - - - Purpose: - Determines the data class for each floating-point - element. - Result value: - Each element is set to all ones if the corresponding - element of ARG1 matches one of the possible data types selected - by ARG2. If not, each element is set to all zeros. ARG2 can - select one of the data types defined in - . - - - - - POWER ISA 3.0 - - - vector bool int vec_test_data_class (vector float, const - int); - - - - - POWER ISA 3.0 - - - vector bool long long vec_test_data_class (vector double, - const int); - - VEC_TRUNC (ARG1) @@ -9717,7 +8524,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> If ARG1 is an integer vector, the value of each element of the result is the value of the corresponding element of the most-significant half of ARG1. - If ARG1 is a floating-point vector, the value of each + If ARG1 is a floating-point vector, the value of each element of the result is the value of the corresponding element of the most-significant half of ARG1, widened to the result precision. @@ -9816,17 +8623,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector double vec_unpackh (vector float); - - - POWER ISA 3.0 - Phased in. - - - - - vector float vec_unpackh (vector _Float16); - - VEC_UNPACKL (ARG1) @@ -9839,7 +8635,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> If ARG1 is an integer vector, the value of each element of the result is the value of the corresponding element of the least-significant half of ARG1. - If ARG1 is a floating-point vector, the value of each + If ARG1 is a floating-point vector, the value of each element of the result is the value of the corresponding element of the least-significant half of ARG, widened to the result precision. @@ -9894,7 +8690,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned int vec_unpackl (vector pixel); - + @@ -9938,17 +8734,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector double vec_unpackl (vector float); - - - POWER ISA 3.0 - Phased in. - - - - - vector float vec_unpackl (vector _Float16); - - VEC_UNSIGNED (ARG1) @@ -9958,7 +8743,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts a vector of double-precision numbers to a vector of unsigned integers. Result value: - Target elements are obtained by truncating the respective + Target elements are obtained by + truncatingconverting + the respective source elements to unsigned integers. @@ -9988,7 +8775,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts a vector of double-precision numbers to a vector of unsigned integers. Result value: - Target elements are obtained by truncating the source + Target elements are obtained by + truncatingconverting + the source elements to the unsigned integers as follows: @@ -10018,7 +8807,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts an input vector to a vector of unsigned integers. Result value: - The even target elements are obtained by truncating the + The even target elements are obtained by + truncatingconverting + the source elements to unsigned integers as follows: Target elements 0 and 2 contain the converted values of the input vector. @@ -10041,7 +8832,9 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Converts an input vector to a vector of unsigned integers. Result value: - The odd target elements are obtained by truncating the + The odd target elements are obtained by + truncatingconverting + the source elements to unsigned integers as follows: Target elements 1 and 3 contain the converted values of the input vector. @@ -10159,344 +8952,155 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> - - - - vector unsigned short vec_xl (long long, unsigned short - *); - - - - - - - - vector double vec_xl (long long, double *); - - - - - - - - vector float vec_xl (long long, float *); - - - - - POWER ISA 3.0. - Phased in. - - - - - vector _Float16 vec_xl (long long, _Float16 *); - - - - - VEC_XL_BE (ARG1. ARG2) - - - Purpose: - In little-endian environments, loads the elements of the - 16-byte vector ARG1 starting with the highest-numbered element at - the memory address specified by the displacement ARG1 and the - pointer ARG2. In big-endian environments, this operator performs - the same operation as VEC_XL. - Result value: - In little-endian mode, loads the elements of the vector in - sequential order, with the highest-numbered element loaded from - the lowest data address and the lowest-numbered element of the - vector at the highest address. All elements are loaded in - little-endian data format. - This function adds the displacement and the pointer R-value - to obtain the address for the load operation. It does not - truncate the affected address to a multiple of 16 bytes. - - - - - - - - vector signed char vec_xl_be (long long, signed char - *); - - - - - - - - vector unsigned char vec_xl_be (long long, unsigned char - *); - - - - - - - - vector signed int vec_xl_be (long long, signed int - *); - - - - - - - - vector unsigned int vec_xl_be (long long, unsigned int - *); - - - - - - - - vector signed __int128 vec_xl_be (long long, signed - __int128 *); - - - - - - - - vector unsigned __int128 vec_xl_be (long long, unsigned - __int128 *); - - - - - - - - vector signed long long vec_xl_be (long long, signed long - long *); - - - - - - - - vector unsigned long long vec_xl_be (long long, unsigned - long long *); - - - - - - - - vector signed short vec_xl_be (long long, signed short - *); - - - - - - - - vector unsigned short vec_xl_be (long long, unsigned short - *); - - - - - - - - vector double vec_xl_be (long long, double *); - - - - - - - - vector float vec_xl_be (long long, float *); - - - - - POWER ISA 3.0 - Phased in. - - - - - vector _Float16 vec_xl_be (long long, _Float16 *); - - - - - VEC_XL_LEN (ARG1, ARG2) - POWER ISA 3.0 + - Purpose: - Loads a vector of a specified byte length. - Result value: - Loads the number of bytes specified by ARG2 from the - address specified in ARG1. Initializes elements in order from the - byte stream (as defined by the endianness of the operating - environment). Any bytes of elements that cannot be initialized - from the number of loaded bytes have a zero value. - At least 0 and at most 16 bytes will be loaded. The length - is specified by the least-significant byte of ARG2, as min (mod - (ARG2, 256), 16). The behavior is undefined if the length - argument is outside of the range 0 - 255, or if it is not a - multiple of the vector element size. + vector unsigned short vec_xl (long long, unsigned short + *); - POWER ISA 3.0 + - vector signed char vec_xl_len (signed char *, - size_t); + vector double vec_xl (long long, double *); - POWER ISA 3.0 + - vector unsigned char vec_xl_len (unsigned char *, - size_t); + vector float vec_xl (long long, float *); - POWER ISA 3.0 + VEC_XL_BE (ARG1. ARG2) - vector signed int vec_xl_len (signed int *, size_t); + Purpose: + In little-endian environments, loads the elements of the + 16-byte vector ARG1 starting with the highest-numbered element at + the memory address specified by the displacement ARG1 and the + pointer ARG2. In big-endian environments, this operator performs + the same operation as VEC_XL. + Result value: + In little-endian mode, loads the elements of the vector in + sequential order, with the highest-numbered element loaded from + the lowest data address and the lowest-numbered element of the + vector at the highest address. All elements are loaded in + little-endian data format. + This function adds the displacement and the pointer R-value + to obtain the address for the load operation. It does not + truncate the affected address to a multiple of 16 bytes. - POWER ISA 3.0 + - vector unsigned int vec_xl_len (unsigned int *, - size_t); + vector signed char vec_xl_be (long long, signed char + *); - POWER ISA 3.0 + - vector signed __int128 vec_xl_len (signed __int128 *, - size_t); + vector unsigned char vec_xl_be (long long, unsigned char + *); - POWER ISA 3.0 + - vector unsigned __int128 vec_xl_len (unsigned __int128 *, - size_t); + vector signed int vec_xl_be (long long, signed int + *); - POWER ISA 3.0 + - vector signed long long vec_xl_len (signed long long *, - size_t); + vector unsigned int vec_xl_be (long long, unsigned int + *); - POWER ISA 3.0 + - vector unsigned long long vec_xl_len (unsigned long long *, - size_t); + vector signed __int128 vec_xl_be (long long, signed + __int128 *); - POWER ISA 3.0 + - vector signed short vec_xl_len (signed short *, - size_t); + vector unsigned __int128 vec_xl_be (long long, unsigned + __int128 *); - POWER ISA 3.0 + - vector unsigned short vec_xl_len (unsigned short *, - size_t); + vector signed long long vec_xl_be (long long, signed long + long *); - POWER ISA 3.0 + - vector double vec_xl_len (double *, size_t); + vector unsigned long long vec_xl_be (long long, unsigned + long long *); - POWER ISA 3.0 + - vector float vec_xl_len (float *, size_t); + vector signed short vec_xl_be (long long, signed short + *); - POWER ISA 3.0 + - vector _Float16 vec_xl_len (_Float16 *, size_t); + vector unsigned short vec_xl_be (long long, unsigned short + *); - VEC_XL_LEN_R (ARG1, ARG2) - POWER ISA 3.0 + - Purpose - Loads a vector of a specified byte length, - right-justified. - Result value: - Loads the number of bytes specified by ARG2 from the - address specified in ARG1, right justified with the first byte to - the left and the last to the right. Initializes elements in order - from the byte stream (as defined by the endianness of the - operating environment). Any bytes of elements that cannot be - initialized from the number of loaded bytes have a zero - value. - At least 0 and at most 16 bytes will be loaded. The length - is specified by the least-significant byte of ARG2, as min (mod - (ARG2, 256), 16). The behavior is undefined if the length - argument is outside of the range 0 - 255, or if it is not a - multiple of the vector element size. + vector double vec_xl_be (long long, double *); - POWER ISA 3.0 + - vector unsigned char vec_xl_len_r (unsigned char *, - size_t); + vector float vec_xl_be (long long, float *); @@ -10767,18 +9371,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> void vec_xst (vector float, long long, float *); - - - POWER ISA 3.0 - Phased in. - - - - - void vec_xst (vector _Float16, long long, _Float16 - *); - - VEC_XST_BE (ARG1, ARG2, ARG3) @@ -10908,184 +9500,6 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> void vec_xst_be (vector float, long long, float *); - - - POWER ISA 3.0 - Phased in. - - - - - void vec_xst_be (vector _Float16, long long, _Float16 - *); - - - - - VEC_XST_LEN (ARG1, ARG2, ARG3) - POWER ISA 3.0 - - - Purpose: - Stores a vector of a specified byte length. - Result value: - Stores the number of bytes specified by ARG3 of the vector - ARG1 to the address specified in ARG2. The bytes are obtained - starting from the lowest-numbered byte of the lowest-numbered - element (as defined by the endianness of the operating - environment). All bytes of an element are accessed before - proceeding to the next higher element. - At least 0 and at most 16 bytes will be stored. The length - is specified by the least-significant byte of ARG3, as min (mod - (ARG2, 256), 16). The behavior is undefined if the length - argument is outside of the range 0 - 255, or if it is not a - multiple of the vector element size. - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector signed char, signed char *, - size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector unsigned char, unsigned char *, - size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector signed int, signed int *, - size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector unsigned int, unsigned int *, - size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector signed __int128, signed __int128 - *, size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector unsigned __int128, unsigned - __int128 *, size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector signed long long, signed long long - *, size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector unsigned long long, unsigned long - long *, size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector signed short, signed short *, - size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector unsigned short, unsigned short *, - size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector double, double *, size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector float, float *, size_t); - - - - - POWER ISA 3.0 - - - void vec_xst_len (vector _Float16, _Float16 *, - size_t); - - - - - VEC_XST_LEN_R (ARG1, ARG2, ARG3) - POWER ISA 3.0 - - - Purpose: - Stores a right-justified vector of a specified byte - length. - Result value: - Stores the number of bytes specified by ARG3 of the - right-justified vector ARG1 to the address specified by - ARG2. - At least 0 and at most 16 bytes will be stored. The length - is specified by the least-significant byte of ARG3, as min (mod - (ARG2, 256), 16). The behavior is undefined if the length - argument is outside of the range 0 - 255, or if it is not a - multiple of the vector element size. - - - - - POWER ISA 3.0 - - - void vec_xst_len_r (vector unsigned char, unsigned char *, - size_t); - - @@ -11667,8 +10081,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Phased in. This optional function is being phased in, and it might not be available on all implementations. - Phased-in interfaces are optional for the current generation of - compliant systems. + Phased-in interfaces are optional for the current generation of + compliant systems. int vec_all_lt (vector unsigned long long, vector unsigned @@ -12964,8 +11378,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> each element must be established as big-endian. Thus, for example, a SHA computation in a little-endian environment may be performed by using the following sequence: - le_result = vec_revb(vec_shasigma_be(vec_revb(le_input), 0, - 0)); + le_result = vec_revb(vec_shasigma_be(vec_revb(le_input), 0, 0)); Built-In Vector Operators for Secure Hashing and Finite Field Arithmetic @@ -13139,12 +11552,12 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> order for each vector must be established as big endian.Thus, for example, an SBOX computation in a little-endian environment may be performed by using the following sequence: - le_result = vec_reve(vec_sbox(vec_reve(le_input), 0, 0)); + le_result = vec_reve(vec_sbox(vec_reve(le_input), 0, 0));Alternatively, the vec_xl_be and vec_xst_be operators may be used to access operands as follows: - input = vec_xl_be(0, &le_input); - result = vec_sbox(input); - vec_xst_be(result,0, &le_result); + input = vec_xl_be(0, &le_input); +result = vec_sbox(input); +vec_xst_be(result,0, &le_result);
Built-In Vector Operators for the Advanced Encryption Standard @@ -13452,7 +11865,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128">
-
+
PowerSIMD API Named Constants This section defines constants for use by the PowerSIMD vector programming operators. They may be defined either as macros or as named @@ -15268,7 +13681,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vsx2 - vector unsigned int vec_vpopcntw (vector unsigned + vector unsigned int vec_vpopcntw (vector + unsigned int); @@ -16174,7 +14588,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vsx2 - vector signed long long vec_vupklsw (vector signed + vector signed long long vec_vupklsw (vector + signed int); @@ -16183,7 +14598,8 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vsx2 - vector unsigned long long vec_vupklsw (vector unsigned + vector unsigned long long vec_vupklsw (vector + unsigned int); @@ -17020,7 +15436,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector bool short); - + VEC_INSERT (ARG1, ARG2, ARG3) @@ -17036,7 +15452,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> the vector to determine the element position. - + @@ -17045,7 +15461,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> char, signed int); - + @@ -17054,7 +15470,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> signed int); - + @@ -17063,7 +15479,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector bool long long, signed int); - + @@ -17490,7 +15906,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector bool short); - + VEC_MLADD (ARG1, ARG2, ARG3) @@ -17507,7 +15923,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> using modular arithmetic. - + @@ -17516,7 +15932,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> signed short, vector signed short); - + @@ -17525,7 +15941,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short, vector unsigned short); - + @@ -17534,7 +15950,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector signed short, vector signed short); - + @@ -17900,7 +16316,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> bool short); - + @@ -17909,7 +16325,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> double); - + @@ -17918,7 +16334,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long); - + @@ -17926,7 +16342,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector float vec_or (vector bool int, vector float); - + @@ -18091,7 +16507,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector bool short); - + @@ -18100,7 +16516,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> double); - + @@ -18109,7 +16525,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long); - + @@ -18118,7 +16534,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> float); - + @@ -18127,7 +16543,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> int); - + VEC_SEL (ARG1, ARG2, ARG3) @@ -18142,7 +16558,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> corresponding bit of ARG2. - + @@ -18151,7 +16567,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned long long); - + VEC_SLL (ARG1, ARG2) @@ -18166,7 +16582,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> count specification. - + @@ -18175,7 +16591,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> char); - + @@ -18184,7 +16600,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> int); - + @@ -18193,7 +16609,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> short); - + @@ -18202,7 +16618,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18211,7 +16627,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18220,7 +16636,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18229,7 +16645,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18238,7 +16654,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> char); - + @@ -18247,7 +16663,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> int); - + @@ -18256,7 +16672,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> short); - + @@ -18265,7 +16681,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18274,7 +16690,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18283,7 +16699,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18292,7 +16708,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18301,7 +16717,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned char); - + @@ -18311,7 +16727,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned long long); - + @@ -18321,7 +16737,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned short); - + @@ -18331,7 +16747,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned long long); - + @@ -18341,7 +16757,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned short); - + @@ -18351,7 +16767,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long, vector unsigned long long); - + @@ -18361,7 +16777,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long, vector unsigned short); - + @@ -18370,7 +16786,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> int); - + @@ -18379,7 +16795,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> short); - + @@ -18388,7 +16804,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned char); - + @@ -18397,7 +16813,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18406,7 +16822,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18415,7 +16831,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18424,7 +16840,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18433,7 +16849,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned int); - + @@ -18452,9 +16868,10 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> Result value: The result is the contents of ARG1, shifted right by the number of bits specified by the 3 least-significant bits of ARG2. - The bits that are shifted out are replaced by zeros. The shift + The bits that are shifted out are replaced by zeros. + The shift count must have been replicated into all bytes of the shift count - specification. + specification. @@ -18484,7 +16901,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> short); - + @@ -18493,7 +16910,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18502,7 +16919,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18511,7 +16928,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18547,7 +16964,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> short); - + @@ -18556,7 +16973,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18565,7 +16982,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18574,7 +16991,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18583,7 +17000,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18593,7 +17010,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned long long); - + @@ -18603,7 +17020,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned short); - + @@ -18613,7 +17030,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long, vector unsigned long long); - + @@ -18623,7 +17040,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long, vector unsigned short); - + @@ -18632,7 +17049,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> int); - + @@ -18668,7 +17085,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18677,7 +17094,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned int); - + @@ -18686,7 +17103,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> unsigned short); - + @@ -18695,7 +17112,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector unsigned int); - + @@ -19000,7 +17417,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> corresponding single-precision floating-point elements of ARG1 and ARG2. - Deprecated: The preferred form of this vector + Deprecated: The preferred form of this vector built-in function is vec_ctlz. @@ -19227,7 +17644,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> vector bool short); - + @@ -19236,7 +17653,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> double); - + @@ -19245,7 +17662,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> long); - + @@ -19254,7 +17671,7 @@ xml:id="dbdoclet.50655245_pgfId-1138128"> float); - + diff --git a/specification/app_b.xml b/specification/app_b.xml index 61e52a7..f0088ac 100644 --- a/specification/app_b.xml +++ b/specification/app_b.xml @@ -1,6 +1,6 @@ +xml:id="dbdoclet.50655245_pgfId-1450875" revisionflag="added"> Binary-Coded Decimal Built-In Functions Binary-coded decimal (BCD) values are compressed; each decimal digit and sign bit occupies 4 bits. Digits are ordered right-to-left in the order @@ -19,8 +19,8 @@ xml:id="dbdoclet.50655245_pgfId-1450875"> bytes. BCD built-in functions are valid only when - - march or - - qarch is set to target POWER8 processors or + march or - + qarch is set to target POWER8 processors or later. @@ -411,12 +411,11 @@ xml:id="dbdoclet.50655245_pgfId-1450875"> in . The bcd data type is defined as follows in the bcd.h: - typedef bcd vector unsigned char; + typedef bcd vector unsigned char; The header file also defines a bcd_default_format as follows: - #ifndef bcd_default_format - #define bcd_default_format __BCD_SIGN_IBM - #endif - + #ifndef bcd_default_format +#define bcd_default_format __BCD_SIGN_IBM +#endif BCD Functions Defined by bcd.h @@ -810,42 +809,45 @@ xml:id="dbdoclet.50655245_pgfId-1450875">
Sample bcd.h Listing - - #ifndef __BCD_H - #define __BCD_H - typedef bcd vector unsigned char; - #define BCD_FORMAT_IBM 0 - #define BCD_FORMAT_Z 0 - #define BCD_FORMAT_POWER 0 - #define BCD_FORMAT_IBMi 1 - #define BCD_FORMAT_I 1 - #define BCD_FORMAT_NCR 1 - #ifndef bcd_default_format - #define bcd_default_format __BCD_SIGN_IBM - #endif - #define bcd_add(a,b) ((bcd)__builtin_bcdadd (a,b,bcd_default_format)) - #define bcd_sub(A,b) ((bcd)__builtin_bcdsub (a,b,bcd_default_format)) - #define bcd_add_ofl(a,b) ((_Bool)__builtin_bcdadd_ofl (a,b)) - #define bcd_add_ofl(a,b) ((_Bool)__builtin_bcdsub_ofl (a,b)) - #define bcd_invalid(a) ((_Bool)__builtin_bcd_invalid (a)) - #define bcd_cmpeq(a,b) ((_Bool)__builtin_bcdcmpeq (a,b)) - #define bcd_cmpge(a,b) ((_Bool)__builtin_bcdcmpge (a,b)) - #define bcd_cmpgt(a,b) ((_Bool)__builtin_bcdcmpgt (a,b)) - #define bcd_cmple(a,b) ((_Bool)__builtin_bcdcmple (a,b)) - #define bcd_cmplt(a,b) ((_Bool)__builtin_bcdcmplt (a,b)) - #define bcd_cmpne(a,b) (!(_Bool)__builtin_bcdcmpeq (a,b)) - #define bcd_xl(a,b) ((bcd)vec_xl_len_r(a,b)) - #define bcd_xst(a,b) ((bcd)vec_xst_len_r(a,b)) - #define bcd_quantize(d) (__builtin_bcdquantize(d)) - #define bcd_dfp(a) (__builtin_bcd2dfp (a)) - #define bcd_dfp2bcd(DFP) ((bcd)__builtin_vec_DFP2BCD (_Decimal128 dfp)) - #define bcd_string2bcd(string) ((bcd) __bcd_string2bcd (string, bcd_default_format) - #define bcd_mul10(a) ((bcd) __builtin_bcdmul10 (a)) - #define bcd_div10(a) ((bcd) __builtin_bcddiv10 (a)) - #define bcd_mul(a,b) ((bcd) __bcd_mul (a,b,bcd_default_format)) - #define bcd_div(a,b) ((bcd) __bcd_div (a,b,bcd_default_format)) - #endif /* __BCD_H */ - + #ifndef __BCD_H +#define __BCD_H + +typedef bcd vector unsigned char; + +#define BCD_FORMAT_IBM 0 +#define BCD_FORMAT_Z 0 +#define BCD_FORMAT_POWER 0 +#define BCD_FORMAT_IBMi 1 +#define BCD_FORMAT_I 1 +#define BCD_FORMAT_NCR 1 + +#ifndef bcd_default_format +#define bcd_default_format __BCD_SIGN_IBM +#endif + +#define bcd_add(a,b) ((bcd)__builtin_bcdadd (a,b,bcd_default_format)) +#define bcd_sub(A,b) ((bcd)__builtin_bcdsub (a,b,bcd_default_format)) +#define bcd_add_ofl(a,b) ((_Bool)__builtin_bcdadd_ofl (a,b)) +#define bcd_add_ofl(a,b) ((_Bool)__builtin_bcdsub_ofl (a,b)) +#define bcd_invalid(a) ((_Bool)__builtin_bcd_invalid (a)) +#define bcd_cmpeq(a,b) ((_Bool)__builtin_bcdcmpeq (a,b)) +#define bcd_cmpge(a,b) ((_Bool)__builtin_bcdcmpge (a,b)) +#define bcd_cmpgt(a,b) ((_Bool)__builtin_bcdcmpgt (a,b)) +#define bcd_cmple(a,b) ((_Bool)__builtin_bcdcmple (a,b)) +#define bcd_cmplt(a,b) ((_Bool)__builtin_bcdcmplt (a,b)) +#define bcd_cmpne(a,b) (!(_Bool)__builtin_bcdcmpeq (a,b)) +#define bcd_xl(a,b) ((bcd)vec_xl_len_r(a,b)) +#define bcd_xst(a,b) ((bcd)vec_xst_len_r(a,b)) +#define bcd_quantize(d) (__builtin_bcdquantize(d)) +#define bcd_dfp(a) (__builtin_bcd2dfp (a)) +#define bcd_dfp2bcd(DFP) ((bcd)__builtin_vec_DFP2BCD (_Decimal128 dfp)) +#define bcd_string2bcd(string) ((bcd) __bcd_string2bcd (string, bcd_default_format) +#define bcd_mul10(a) ((bcd) __builtin_bcdmul10 (a)) +#define bcd_div10(a) ((bcd) __builtin_bcddiv10 (a)) +#define bcd_mul(a,b) ((bcd) __bcd_mul (a,b,bcd_default_format)) +#define bcd_div(a,b) ((bcd) __bcd_div (a,b,bcd_default_format)) + +#endif /* __BCD_H */
diff --git a/specification/app_glossary.xml b/specification/app_glossary.xml index bc6a413..ad2b2cb 100644 --- a/specification/app_glossary.xml +++ b/specification/app_glossary.xml @@ -16,7 +16,7 @@ xml:id="dbdoclet.50655246_33489"> Application binary interface - + AES @@ -24,7 +24,7 @@ xml:id="dbdoclet.50655246_33489"> Advanced Encryption Standard - + API @@ -32,7 +32,7 @@ xml:id="dbdoclet.50655246_33489"> Application programming interface - + ASCII @@ -40,7 +40,7 @@ xml:id="dbdoclet.50655246_33489"> American Standard Code for Information Interchange - + BCD @@ -56,7 +56,7 @@ xml:id="dbdoclet.50655246_33489"> Big-endian - + COBOL @@ -72,7 +72,7 @@ xml:id="dbdoclet.50655246_33489"> Condition Register - + CTR @@ -80,7 +80,7 @@ xml:id="dbdoclet.50655246_33489"> Count Register - + DFP @@ -96,7 +96,7 @@ xml:id="dbdoclet.50655246_33489"> Double precision - + DRN @@ -105,7 +105,7 @@ xml:id="dbdoclet.50655246_33489"> register. - + DSCR @@ -137,7 +137,7 @@ xml:id="dbdoclet.50655246_33489"> Debug with arbitrary record format - + EA @@ -153,7 +153,7 @@ xml:id="dbdoclet.50655246_33489"> Executable and Linking Format - + EOS @@ -177,7 +177,7 @@ xml:id="dbdoclet.50655246_33489"> Floating-Point Status and Control Register - + GCC @@ -185,7 +185,7 @@ xml:id="dbdoclet.50655246_33489"> GNU Compiler Collection - + GEP @@ -209,7 +209,7 @@ xml:id="dbdoclet.50655246_33489"> General Purpose Register - + HTM @@ -281,7 +281,7 @@ xml:id="dbdoclet.50655246_33489"> Little-endian - + LEP @@ -305,7 +305,7 @@ xml:id="dbdoclet.50655246_33489"> Least-significant byte - + MB @@ -313,7 +313,7 @@ xml:id="dbdoclet.50655246_33489"> Megabyte - + MSB @@ -321,7 +321,7 @@ xml:id="dbdoclet.50655246_33489"> Most-significant byte - + MSR @@ -329,7 +329,7 @@ xml:id="dbdoclet.50655246_33489"> Machine State Register - + N/A @@ -345,7 +345,7 @@ xml:id="dbdoclet.50655246_33489"> Not-a-Number - + NOP @@ -354,7 +354,7 @@ xml:id="dbdoclet.50655246_33489"> affect registers or generate bus activity. - + NOR @@ -362,7 +362,7 @@ xml:id="dbdoclet.50655246_33489"> In Boolean logic, the negation of a logical OR. - + OE @@ -387,7 +387,7 @@ xml:id="dbdoclet.50655246_33489"> Position-independent executable - + PIM @@ -411,7 +411,7 @@ xml:id="dbdoclet.50655246_33489"> Performance Monitor Registers - + POSIX @@ -419,7 +419,7 @@ xml:id="dbdoclet.50655246_33489"> Portable Operating System Interface - + PS @@ -427,7 +427,7 @@ xml:id="dbdoclet.50655246_33489"> Positive sign - + RN @@ -436,7 +436,7 @@ xml:id="dbdoclet.50655246_33489"> FPSCR register. - + RPG @@ -444,7 +444,7 @@ xml:id="dbdoclet.50655246_33489"> Report Program Generator - + SHA @@ -476,7 +476,7 @@ xml:id="dbdoclet.50655246_33489"> Special Purpose Register - + SVID @@ -516,7 +516,7 @@ xml:id="dbdoclet.50655246_33489"> Thread pointer - + UE @@ -533,7 +533,7 @@ xml:id="dbdoclet.50655246_33489"> Unit of least precision - + VDSO @@ -541,7 +541,7 @@ xml:id="dbdoclet.50655246_33489"> Virtual dynamic shared object - + VE @@ -574,7 +574,7 @@ xml:id="dbdoclet.50655246_33489"> Vector scalar extension - + XE @@ -583,7 +583,7 @@ xml:id="dbdoclet.50655246_33489"> FPSCR register. - + XER @@ -591,7 +591,7 @@ xml:id="dbdoclet.50655246_33489"> Fixed-Point Exception Register - + XNOR @@ -599,7 +599,7 @@ xml:id="dbdoclet.50655246_33489"> Exclusive NOR - + XOR @@ -607,7 +607,7 @@ xml:id="dbdoclet.50655246_33489"> Exclusive OR - + ZE diff --git a/specification/bk_main.xml b/specification/bk_main.xml index d2784d5..5b31356 100644 --- a/specification/bk_main.xml +++ b/specification/bk_main.xml @@ -50,8 +50,8 @@ - - + + Copyright details are filled in by the template. @@ -86,7 +86,38 @@ - Revision 1.1b - initial conversion from framemaker + Revision 1.2b - initial conversion from framemaker + + + + + + 2016-06-13 + + + + Version 1.2 - POWER8 erratum and POWER9 support. + + + + + + + 2015-07-16 + + + + Version 1.1 - initial conversion from framemaker + + + + + + 2014-07-21 + + + + Version 1.0 initial release. diff --git a/specification/ch_1.xml b/specification/ch_1.xml index f951492..a06ee9f 100644 --- a/specification/ch_1.xml +++ b/specification/ch_1.xml @@ -1,52 +1,49 @@ + xmlns:xi="http://www.w3.org/2001/XInclude" + xmlns:xl="http://www.w3.org/1999/xlink" + version="5.0" xml:lang="en" + xml:id="ch_1"> Introduction -
- Introduction - The Executable and Linking Format (ELF) defines a linking interface - for executables and shared objects in two parts. - - - The first part is the generic System V ABI ( - - + + The Executable and Linking Format (ELF) defines a linking interface + for executables and shared objects in two parts. + + + The first part is the generic System V ABI + (http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/normativerefs.html#NORMATIVEREFSSECT). + + + + The second part is a processor-specific supplement. + + + This document, the OpenPOWER ABI for Linux Supplement for the Power + Architecture 64-bit ELF V2 ABI, is the OpenPOWER-compliant + processor-specific supplement for use with ELF V2 on 64-bit IBM Power + Architecture® systems. This is not a complete System V ABI supplement + because it does not define any library interfaces. + This document establishes both big-endian and little-endian + application binary interfaces (see + ). + OpenPOWER-compliant processors in the 64-bit Power Architecture can execute + in either big-endian or little-endian mode. Executables and + executable-generated data (in general) that subscribes to either byte + ordering is not portable to a system running in the other mode. + + + Note: + This ABI specification does not + address little-endian byte ordering before Power ISA 2.03. + + + + The OpenPOWER ELF V2 ABI is not the same as either the Power + Architecture 32-bit ABI supplement or the 64-bit IBM PowerPC® ELF ABI (ELF + V1). + The Power Architecture 64-bit OpenPOWER ELF V2 ABI supplement is + intended to use the same structural layout now followed in practice by + other processor-specific ABIs. - - http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/normativerefs.html#NORMATIVEREFSSECT - - ). - - - The second part is a processor-specific supplement. - - - This document, the OpenPOWER ABI for Linux Supplement for the Power - Architecture 64-bit ELF V2 ABI, is the OpenPOWER-compliant - processor-specific supplement for use with ELF V2 on 64-bit IBM Power - Architecture® systems. This is not a complete System V ABI supplement - because it does not define any library interfaces. - This document establishes both big-endian and little-endian - application binary interfaces (see - ). - OpenPOWER-compliant processors in the 64-bit Power Architecture can execute - in either big-endian or little-endian mode. Executables and - executable-generated data (in general) that subscribes to either byte - ordering is not portable to a system running in the other mode. - - - Note: - - http://www.power.org/ - - - - The OpenPOWER ELF V2 ABI is not the same as either the Power - Architecture 32-bit ABI supplement or the 64-bit IBM PowerPC® ELF ABI (ELF - V1). - The Power Architecture 64-bit OpenPOWER ELF V2 ABI supplement is - intended to use the same structural layout now followed in practice by - other processor-specific ABIs. -
Reference Documentation The archetypal ELF ABI is described by the System V ABI. @@ -60,8 +57,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> Power Instruction Set Architecture, Version 3.0, IBM, 2016. - - http://www.power.org + http://www.power.org @@ -70,19 +66,14 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> DWARF Debugging Information Format, Version 4, DWARF Debugging Information Format Workgroup, 2010. - - http://dwarfstd.org/Dwarf4Std.php - + http://dwarfstd.org/Dwarf4Std.php ISO/IEC 9899:2011: Programming languages—C. - - - - http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=57853 + http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=57853 @@ -90,10 +81,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> Itanium C++ ABI: Exception Handling. Rev 1.22, CodeSourcery, 2001. - - - - http://www.codesourcery.com/public/cxx-abi/abi-eh.html + http://www.codesourcery.com/public/cxx-abi/abi-eh.html @@ -104,10 +92,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> programming language C to support decimal floating-point arithmetic, ISO/IEC, January 05, 2009. Available from ISO. - - - - http://www.iso.org/iso/home/store/catalogue_tc/catalogue_tc_browse.htm?commid=45202 + http://www.iso.org/iso/home/store/catalogue_tc/catalogue_tc_browse.htm?commid=45202 @@ -116,8 +101,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> ELF Handling for Thread-Local Storage, Version 0.20, Ulrich Drepper, Red Hat Inc., December 21, 2005. - - http://people.redhat.com/drepper/tls.pdf + http://people.redhat.com/drepper/tls.pdf @@ -130,10 +114,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> 64-bit PowerPC ELF Application Binary Interface Supplement 1.9. - - - - http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html + http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html @@ -169,10 +150,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> ALTIVEC PIM: AltiVec™ Technology Programming Interface Manual, Freescale Semiconductor, 1999. - - - - http://www.freescale.com/files/32bit/doc/ref_manual/ALTIVECPIM.pdf + http://www.freescale.com/files/32bit/doc/ref_manual/ALTIVECPIM.pdf diff --git a/specification/ch_2.xml b/specification/ch_2.xml index 019bedf..522fb05 100644 --- a/specification/ch_2.xml +++ b/specification/ch_2.xml @@ -73,9 +73,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> , byte numbers are indicated in the upper corners, and bit numbers are indicated in the lower corners. - +
+ + Little-Endian Bit and Byte Numbering Example @@ -91,12 +93,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +   - +   @@ -113,9 +115,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + + + Little-Endian Bit and Byte Numbering in Halfwords @@ -164,8 +168,10 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + + + Little-Endian Bit and Byte Numbering in Words @@ -298,8 +304,10 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + + + Little-Endian Bit and Byte Numbering in Doublewords @@ -546,10 +554,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- - + + + Little-Endian Bit and Byte Numbering in Quadwords @@ -563,6 +572,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> + @@ -718,7 +728,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +   @@ -832,7 +842,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +   @@ -976,7 +986,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - + LSB @@ -1032,9 +1042,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> , byte numbers are indicated in the upper corners, and bit numbers are indicated in the lower corners. - +
+ + Big-Endian Bit and Byte Numbering Example @@ -1050,7 +1062,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +   @@ -1067,9 +1079,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + + + Big-Endian Bit and Byte Numbering in Halfwords @@ -1142,9 +1156,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + + + Big-Endian Bit and Byte Numbering in Words @@ -1277,9 +1293,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + + + Big-Endian Bit and Byte Numbering in Doublewords @@ -1526,9 +1544,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- Big-Endian Bit and Byte Numbering in Quadwords + + @@ -1697,7 +1717,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +   @@ -1727,13 +1747,13 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - LSB + - + 32 @@ -1811,7 +1831,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +   @@ -2009,15 +2029,15 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - FPSCR Formats: As of Power ISA version 2.05, the + FPSCR Formats: As of Power ISA version 2.05, the FPSCR is extended from 32 bits to 64 bits. The fields of the original 32-bit FPSCR are now held in bits 32 - 63 of the 64-bit FPSCR. The assembly instructions that operate upon the 64-bit FPSCR have either a W instruction field added to select the operative word for the instruction (for example, - mtfsfi) or the instruction is extended to + mtfsfi) or the instruction is extended to operate upon the entire 64-bit FPSCR, (for example, - mffs). Fields of the FPSCR that represent 1 or + mffs). Fields of the FPSCR that represent 1 or more bits are referred to by field number with an indication of the operative word rather than by bit number. @@ -2039,8 +2059,9 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> shall return the value corresponding to a data type starting at the specified address when accessed with either the pointer dereference operator * or the array reference operator []. - +
+ Scalar Types @@ -2695,7 +2716,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> 64- 1. - + @@ -2712,7 +2733,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Vector of 1 unsigned quadword. - + @@ -2729,7 +2750,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Vector of 1 signed quadword. - + @@ -2793,6 +2814,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> bits set to 1. , is undefined for all vector elements. +   Decimal Floating-Point (ISO TR 24732 Support) The decimal floating-point data type is used to specify variables @@ -2890,9 +2912,9 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
+   IBM EXTENDED PRECISION - IBM EXTENDED PRECISION Type @@ -2951,6 +2973,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
+   IEEE BINARY 128 EXTENDED PRECISION @@ -3064,12 +3087,14 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
+   IBM EXTENDED PRECISION && IEEE BINARY 128 EXTENDED PRECISION Availability of the long double data type is subject to conformance to a long double standard where the IBM EXTENDED PRECISION format and the IEEE BINARY 128 EXTENDED PRECISION format are mutually exclusive. +   IEEE BINARY 128 EXTENDED PRECISION || IBM EXTENDED PRECISION This ABI provides the following choices for implementation of @@ -3079,62 +3104,66 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> IEEE BINARY 128 EXTENDED PRECISION - - - Long double is implemented as an IEEE 128-bit quad-precision - binary floating-point type in accordance with the applicable IEEE - floating-point standards. - - - Support is provided for all IEEE standard features. - - - IEEE128 quad-precision values are passed in VMX parameter - registers. - - - With some compilers, _Float128 can be used to access IEEE128 - independent of the floating-point representation chosen for the - long double ISO C type. However, this is not part of the C - standard. + + + Long double is implemented as an IEEE 128-bit quad-precision + binary floating-point type in accordance with the applicable IEEE + floating-point standards. + + + Support is provided for all IEEE standard features. + + + IEEE128 quad-precision values are passed in VMX parameter + registers. + + + With some compilers, _Float128 can be used to access IEEE128 + independent of the floating-point representation chosen for the + long double ISO C type. However, this is not part of the C + standard. + + IBM EXTENDED PRECISION - - - Support is provided for the IBM EXTENDED PRECISION format. In - this format, double-precision numbers with different magnitudes - that do not overlap provide an effective precision of 106 bits or - more, depending on the value. The high-order double-precision value - (the one that comes first in storage) must have the larger - magnitude. The high-order double-precision value must equal the sum - of the two values, rounded to nearest double (the Linux convention, - unlike AIX). - - - IBM EXTENDED PRECISION form provides the same range as double - precision (about 10 - -308 to 10 - 308) but more precision (a variable amount, - about 31 decimal digits or more). - - - As the absolute value of the magnitude decreases (near the - denormal range), the precision available in the low-order double - also decreases. - - - When the value represented is in the subnormal or denormal - range, this representation provides no more precision than 64-bit - (double) floating-point. - - - The actual number of bits of precision can vary. If the - low-order part is much less than one unit of least precision (ULP) - of the high-order part, significant bits (all 0s or all 1s) are - implied between the significands of high-order and low-order - numbers. Some algorithms that rely on having a fixed number of bits - in the significand can fail when using extended precision. + + + Support is provided for the IBM EXTENDED PRECISION format. In + this format, double-precision numbers with different magnitudes + that do not overlap provide an effective precision of 106 bits or + more, depending on the value. The high-order double-precision value + (the one that comes first in storage) must have the larger + magnitude. The high-order double-precision value must equal the sum + of the two values, rounded to nearest double (the Linux convention, + unlike AIX). + + + IBM EXTENDED PRECISION form provides the same range as double + precision (about 10 + -308 to 10 + 308) but more precision (a variable amount, + about 31 decimal digits or more). + + + As the absolute value of the magnitude decreases (near the + denormal range), the precision available in the low-order double + also decreases. + + + When the value represented is in the subnormal or denormal + range, this representation provides no more precision than 64-bit + (double) floating-point. + + + The actual number of bits of precision can vary. If the + low-order part is much less than one unit of least precision (ULP) + of the high-order part, significant bits (all 0s or all 1s) are + implied between the significands of high-order and low-order + numbers. Some algorithms that rely on having a fixed number of bits + in the significand can fail when using extended precision. + + This implementation differs from the IEEE 754 Standard in the @@ -3195,7 +3224,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - +
Structure with No Padding @@ -3205,7 +3234,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Structure with Internal Padding @@ -3215,7 +3244,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Structure with Internal and Tail Padding @@ -3225,7 +3254,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Structure with Vector Element and Internal Padding @@ -3235,7 +3264,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Structure with Vector Element and Tail Padding @@ -3245,7 +3274,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Structure with Internal Padding and Vector Element @@ -3255,7 +3284,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Structure with Internal Padding and 128-Bit Integer @@ -3265,7 +3294,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Packed Structure @@ -3275,7 +3304,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- +
Union Allocation @@ -3296,7 +3325,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> w - 1 to 2 w - 1- 1 and an unsigned range goes from 0 to 2 w- 1. - + Bit Field Types @@ -3449,9 +3478,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> , the little-endian byte offsets are given in the upper right corners, and the bit numbers are given in the lower corners. - +
+ + Little-Endian Bit Numbering for 0x01020304 @@ -3702,9 +3733,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> , the big-endian byte offsets are given in the upper left corners, and the bit numbers are given in the lower corners. - +
+ + Big-Endian Bit Numbering for 0x01020304 @@ -3951,11 +3984,11 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + The byte offsets for structure and union members are shown in through . - +
Simple Bit Field Allocation @@ -3965,8 +3998,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- - +
Bit Field Allocation with Boundary Alignment @@ -3976,8 +4008,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- - +
Bit Field Allocation with Storage Unit Sharing @@ -3987,8 +4018,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- - +
Bit Field Allocation in a Union @@ -3998,8 +4028,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- - +
Bit Field Allocation with Unnamed Bit Fields @@ -4009,7 +4038,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
- + , the alignment of the structure is not affected by the unnamed short and int fields. The @@ -4054,8 +4083,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> give an overview of the registers that are global during program execution. The tables use three terms to describe register preservation rules: - - + + @@ -4131,7 +4160,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> general-purpose registers, each 64 bits wide. Throughout this document the symbol rN is used, where N is a register number, to refer to general-purpose register N. - + Register Roles @@ -4367,6 +4396,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
+   TOC Pointer Usage As described in @@ -4404,6 +4434,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> non-ABI compliant code is not guaranteed to be portable and supported in all systems. For examples of compliant and noncompliant code, see . +   Optional Function Linkage Except as follows, a function cannot depend on the values of @@ -4427,6 +4458,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> . +   Stack Frame Pointer The stack pointer always points to the lowest allocated valid stack frame. It must maintain quadword alignment and grow toward the @@ -4435,15 +4467,17 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> maintain back chains. A called function is permitted to decrement it if required. For more information, see . +   Link Register The link register contains the address that a called function normally returns to. It is volatile across function calls. +   Condition Register Fields In the condition register, the bit fields CR2, CR3, and CR4 are nonvolatile. The value on entry must be restored on exit. The other bit fields are volatile. This ABI requires OpenPOWER-compliant processors to implement - mfocr instructions in a manner that initializes + mfocr instructions in a manner that initializes undefined bits of the RT result register of mfocr instructions to one of the following values: @@ -4454,27 +4488,27 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> The architected value of the corresponding CR field in the - mfocr instruction + mfocr instruction - - - - POWER8 Erratum: When executing an - mfocr instruction, the POWER8 processor does not - implement the behavior described in the "Fixed-Point Invalid Forms - and Undefined Conditions" section of - POWER8 Processor User's Manual for the Single-Chip - Module. Instead, it replicates the selected condition - register field within the byte that contains it rather than - initializing to 0 the bits corresponding to the nonselected bits of - the byte that contains it. When generating code to save two condition - register fields that are stored in the same byte, the compiler must - mask the value received from - mfocr to avoid corruption of the resulting - (partial) condition register word. - This erratum does not apply to the POWER9 processor. - + + + NoteErratum: + When executing an + mfocr instruction, the POWER8 processor does not + implement the behavior described in the "Fixed-Point Invalid Forms + and Undefined Conditions" section of + POWER8 Processor User's Manual for the Single-Chip + Module. Instead, it replicates the selected condition + register field within the byte that contains it rather than + initializing to 0 the bits corresponding to the nonselected bits of + the byte that contains it. When generating code to save two condition + register fields that are stored in the same byte, the compiler must + mask the value received from + mfocr to avoid corruption of the resulting + (partial) condition register word. + This erratum does not apply to the POWER9 processor. + For more information, see @@ -4494,43 +4528,48 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> 32-register subset of 64 bits per register. They can also be addressed with VMX instructions to refer to a 32-register subset of 128-bit wide registers. - + Floating-Point Registers as Part of VSRs - - + + - + - + + - - + + VSR(0) FPR[0] - - + + + - + + - + VSR(1) FPR1] - - + + + - + + - + @@ -4541,62 +4580,71 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> ... - - + + + - + + - + VSR(30) FPR[30] - - + + + - + + - + VSR(31) FP[31] - - + + + - + + - + VSR(32) - - + + + - + + - - - + VSR(33) - + + + + - - - + - + + + ... @@ -4604,29 +4652,32 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> ... + - - - + VSR(62) - + + + + - - VSR(63) - + + + + - - + + @@ -4641,92 +4692,105 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> 255 +
- - + +
Vector Registers as Part of VSRs - - - - + + + + - + VSR(0) - - + + + - + + - + VSR(1) - - + + + - + + - + - + ... ... - + + - + + - + VSR(30) - - + + + - + + - + VSR(31) - - + + + - + + - + VSR(32) - + VR[0] + - + VSR(33) - + VR[1] + - + - + ... @@ -4734,22 +4798,25 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> ... + - + VSR(62) - + VR[30] + VSR(63) - + VR[31] + @@ -4758,12 +4825,10 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> 0 - - - 127 + @@ -4776,7 +4841,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> For the purpose of function calls, the right half of VSX registers, corresponding to the classic floating-point registers (that is, vsr0 - vsr31), is volatile. - +
Floating-Point Register Roles for Binary Floating-Point Types @@ -4856,6 +4921,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
+   DFP Support The OpenPOWER ABI supports the decimal floating-point (DFP) format and DFP language extensions. The default implementation of DFP @@ -4891,7 +4957,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> register pair. When a floating-point register is skipped during input parameter allocation, words in the corresponding GPR or memory doubleword in the parameter list are not skipped. - + Floating-Point Register Roles for Decimal Floating-Point Types @@ -4941,7 +5007,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> vector-scalar register file, and a special-purpose register VSCR. Throughout this document, the symbol vN is used, where N is a register number, to refer to vector register N. - +
Vector Register Roles @@ -5029,11 +5095,13 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
+   IEEE BINARY 128 EXTENDED PRECISION Parameters in IEEE BINARY 128 EXTENDED PRECISION format shall be passed in a single 128-bit vector register as if they were vector values. +   IBM EXTENDED PRECISION Parameters in the IBM EXTENDED PRECISION format with a pair of @@ -5048,7 +5116,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> The Power ISA identifies a number of registers that have mutability limited to the specific bit fields indicated in the following list: - + @@ -5137,6 +5205,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> function without permission to change the limited-access bits across a function call shall save the value of the register before modifying the bits and restore it before returning to its calling function. +   Limited-Access Conditions Standard library functions expressly defined to change the state of limited-access bits are not constrained by nonvolatile preservation @@ -5233,7 +5302,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> default, the stack pointer always points to the back-chain word of the most recently allocated stack frame. For more information, see . - +
Stack Frame Organization @@ -5311,7 +5380,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> An example of a minimum stack frame allocation that meets these requirements is shown in . - +
Minimum Stack Frame Allocation with and without Back Chain @@ -5339,6 +5408,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> generation. A system shall provided a programmatic interface to query unwind information when system-wide unwind capabilities are provided. +   CR Save Word If a function changes the value in any nonvolatile field of the condition register, it shall first save at least the value of those @@ -5346,16 +5416,19 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> function exit. The caller frame CR Save Word may be used as the save location. This location in the current frame may be used as temporary storage, which is volatile over function calls. +   Reserved Word This word is reserved for system functions. Modifications of the value contained in this word are prohibited unless explicitly allowed by future ABI amendments. +   LR Save Doubleword If a function changes the value of the link register, it must first save the old value to restore before function exit. The caller frame LR Save Doubleword may be used as the save location. This location in the current frame may be used as temporary storage, which is volatile over a function call. +   TOC Pointer Doubleword If a function changes the value of the TOC pointer register, it shall first save it in the TOC pointer doubleword. @@ -5375,6 +5448,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> might be necessary above the Vector Register Save Area to provide 16-byte alignment, as shown in . +   Floating-Point Register Save Area If a function changes the value in any nonvolatile floating-point register fN, it shall first save the value in fN in the Floating-Point @@ -5384,6 +5458,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> the number of floating-point registers that must be saved. If no floating-point registers are to be saved, the Floating-Point Register Save Area has a zero size. +   General-Purpose Register Save Area If a function changes the value in any nonvolatile @@ -5404,6 +5479,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> upon the number of general registers that must be saved. If no general-purpose registers are to be saved, the General-Purpose Register Save Area has a zero size. +   Vector Register Save Area If a function changes the value in any nonvolatile vector register vN, it shall first save the value in vN in the Vector Register @@ -5426,6 +5502,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> must be saved. It ranges from 0 bytes to a maximum of 192 bytes (12 X 16). If no vector registers are to be saved, the Vector Register Save Area has a zero size. +   Local Variable Space The Local Variable Space is used for allocation of local variables. The Local Variable Space is located immediately above the @@ -5438,6 +5515,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> The Local Variable Space also contains any parameters that need to be assigned a memory address when the function's parameter list does not require a save area to be allocated by the caller. +   Parameter Save Area The Parameter Save Area shall be allocated by the caller for @@ -5603,8 +5681,10 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Pad an aggregate or union smaller than one doubleword in - size, but having a non-zero size, so that it is in the - least-significant bits of the doubleword. Pad all others, if + size, but having a non-zero size, + so that it is in the + least-significant bits of the doubleword. + Pad all others, if necessary, at their tail. Variable size aggregates or unions are passed by reference. @@ -5771,12 +5851,13 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> a number of GPRs are skipped, in allocation order, commensurate to the size of the corresponding in-memory representation of the passed argument's type. - The parameter size is always rounded up to the next multiple of a - doubleword. - + The parameter size is always rounded up to the next multiple of a + doubleword. + Consequently, each parameter of a non-zero size is allocated to at least one doubleword. +   Full doubleword rule: When a doubleword in the Parameter Save Area (or its GPR copy) contains at least a portion of a structure, that doubleword must contain @@ -5784,11 +5865,13 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> can either be completely valid, or completely invalid, but not partially valid and invalid, except in the last doubleword where invalid padding may be present.) +   IEEE BINARY 128 EXTENDED PRECISION Up to 12 quad-precision parameters can be passed in v2 - v13. For the purpose of determining qualified floating-point and vector arguments, an IEEE 128b type shall be considered a "like" vector type, and a complex _Float128 shall be treated as two individual scalar elements. +   IBM EXTENDED PRECISION IBM EXTENDED PRECISION format parameters are passed as if they were a struct consisting of separate double parameters. @@ -5861,7 +5944,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> buffer, set gr = 4; else set gr = 3. - Set fr = 1 Set vr = 2 + Set fr = 1 +Set vr = 2 SCAN: If there are no more arguments, terminate. Otherwise, @@ -5869,47 +5953,130 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> argument: - switch(class(argument)) unnamed parameter: if gr > - 10 goto mem_argument size = size_in_DW(argument) reg_size = min(size, - 11-gr) pass (GPR, gr, first_n_DW (argument, reg_size)); if - remaining_members argument = after_n_DW(argument,reg_size)) goto - mem_argument break; integer: // up to 64b pointer: // this also - includes all pass by reference values if gr > 10 goto mem_argument - pass (GPR, gr, argument); gr++ break; aggregate: if - (homogeneous(argument,float) and regs_needed(members(argument)) <= - 8) if (register_type_used (type (argument)) == vr) goto use_vrs; - n_fregs = n_fregs_for_type(member_type(argument,0)) agg_size = - members(argument) * n_fregs reg_size = min(agg_size, 15-fr) - pass(FPR,fr,first_n_DW(argument,reg_size) fr += reg_size; gr += - size_in_DW (first_n_DW(argument,reg_size)) if remaining_members - argument = after_n_DW(argument,reg_size)) goto gpr_struct break; if - (homogeneous(argument,vector) and members(argument) <= 8) use_vrs: - agg_size = members(argument) reg_size = min(agg_size, 14-vr) if - (gr&1 = 0) // align vector in memory gr++ - pass(VR,vr,first_n_elements(argument,reg_size); vr += reg_size gr += - size_in_DW (first_n_elements(argument,reg_size) if remaining_members - argument = after_n_elements(argument,reg_size)) goto gpr_struct break; - if gr > 10 goto mem_argument size = size_in_DW(argument) gpr_struct: - reg_size = min(size, 11-gr) pass (GPR, gr, first_n_DW (argument, - reg_size)); gr += size_in_DW (first_n_DW (argument, reg_size)) if - remaining_members argument = after_n_DW(argument,reg_size)) goto - mem_argument break; float: // float is passed in one FPR. // double is - passed in one FPR. // IBM EXTENDED PRECISION is passed in the next two - FPRs. // IEEE BINARY 128 EXTENDED PRECISION is passed in one VR. // - _Decimal32 is passed in the lower half of one FPR. // _Decimal64 is - passed in one FPR. // _Decimal128 is passed in an even-odd FPR pair, - skipping an FPR if necessary. if (register_type_used (type (argument)) - == vr) // Assumes == vr is true for IEEE BINARY 128 EXTENDED PRECISION. - goto use_vr; fr += align_pad(fr,type(argument)) // Assumes align_pad = - 8 for _Decimal128 if fr is odd; otherwise = 0. if fr > 14 goto - mem_argument n_fregs = n_fregs_for_type(argument) // Assumes - n_fregs_for_type == 2 for IBM EXTENDED PRECISION or _Decimal128, == 1 - for float, double, _Decimal32 or _Decimal64. pass(FPR,fr,argument) fr - += n_fregs gr += size_in_DW(argument) break; vector: Use vr: if vr > - 13 goto mem_argument if (gr&1 = 0) // align vector in memory gr++ - pass(VR,vr,argument) vr ++ gr += 2 break; next argument; mem_argument: - need_save_area = TRUE pass (stack, gr, argument) gr += - size_in_DW(argument) next argument; + switch(class(argument)) + +unnamed parameter: + if gr > 10 + goto mem_argument + + size = size_in_DW(argument) + reg_size = min(size, 11-gr) + pass (GPR, gr, first_n_DW (argument, reg_size)); + + if remaining_members + argument = after_n_DW(argument,reg_size)) + goto mem_argument + + break; + +integer: // up to 64b +pointer: // this also includes all pass by reference values + + if gr > 10 + goto mem_argument + pass (GPR, gr, argument); + gr++ + + break; + +aggregate: + if (homogeneous(argument,float) and regs_needed(members(argument)) <=8) + if (register_type_used (type (argument)) == vr) + goto use_vrs; + n_fregs = n_fregs_for_type(member_type(argument,0)) + agg_size = members(argument) * n_fregs + reg_size = min(agg_size, 15-fr) + pass(FPR,fr,first_n_DW(argument,reg_size) + fr += reg_size; + gr += size_in_DW (first_n_DW(argument,reg_size)) + + if remaining_members + argument = after_n_DW(argument,reg_size)) + goto gpr_struct + + break; + + if (homogeneous(argument,vector) and members(argument) <= 8) + use_vrs: + agg_size = members(argument) + reg_size = min(agg_size, 14-vr) + if (gr&1 = 0) // align vector in memory + gr++ + pass(VR,vr,first_n_elements(argument,reg_size); + vr += reg_size + gr += size_in_DW (first_n_elements(argument,reg_size) + + if remaining_members + argument = after_n_elements(argument,reg_size)) + goto gpr_struct + break; + + if gr > 10 + goto mem_argument + + size = size_in_DW(argument) + +gpr_struct: + reg_size = min(size, 11-gr) + pass (GPR, gr, first_n_DW (argument, reg_size)); + gr += size_in_DW (first_n_DW (argument, reg_size)) + + if remaining_members + argument = after_n_DW(argument,reg_size)) + goto mem_argument + + break; + +float: + +// float is passed in one FPR. +// double is passed in one FPR. +// IBM EXTENDED PRECISION is passed in the next two FPRs. +// IEEE BINARY 128 EXTENDED PRECISION is passed in one VR. +// _Decimal32 is passed in the lower half of one FPR. +// _Decimal64 is passed in one FPR. +// _Decimal128 is passed in an even-odd FPR pair, skipping an FPR if necessary. + + if (register_type_used (type (argument)) == vr) + // Assumes == vr is true for IEEE BINARY 128 EXTENDED PRECISION. + goto use_vr; + + fr += align_pad(fr,type(argument)) + // Assumes align_pad = 8 for _Decimal128 if fr is odd; otherwise = 0. + if fr > 14 + goto mem_argument + + n_fregs = n_fregs_for_type(argument) + // Assumes n_fregs_for_type == 2 for IBM EXTENDED PRECISION + // or _Decimal128, == 1 for float, double, _Decimal32 or _Decimal64. + pass(FPR,fr,argument) + fr += n_fregs + gr += size_in_DW(argument) + + break; + +vector: + Use vr: + if vr > 13 + goto mem_argument + + if (gr&1 = 0) // align vector in memory + gr++ + + pass(VR,vr,argument) + vr ++ + gr += 2 + + break; + +next argument; + +mem_argument: + need_save_area = TRUE + pass (stack, gr, argument) + gr += size_in_DW(argument) + +next argument; All complex data types are handled as if two scalar values of the base type were passed as separate parameters. If the callee takes the address of any of its parameters, values @@ -5941,14 +6108,27 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> memory.
Passing Arguments in GPRs, FPRs, and Memory - typedef struct { int a; double dd; } sparm; sparm s, - t; int c, d, e; long double ld;/* IBM EXTENDED PRECISION format */ - double ff, gg, hh; x = func(c, ff, d, ld, s, gg, t, e, hh); Parameter - Register Offset in parameter save area c r3 0-7 (not stored in - parameter save area) ff f1 8-15 (not stored) d r5 16-23 (not stored) - ld f2,f3 24-39 (not stored) s r8,r9 40-55 (not stored) gg f4 56-63 - (not stored) t (none) 64-79 (stored in parameter save area) e (none) - 80-87 (stored) hh f5 88-95 (not stored) + typedef struct { + int a; + double dd; +} sparm; +sparm s, t; +int c, d, e; +long double ld;/* IBM EXTENDED PRECISION format */ +double ff, gg, hh; + +x = func(c, ff, d, ld, s, gg, t, e, hh); + +Parameter Register Offset in parameter save area +c r3 0-7 (not stored in parameter save area) +ff f1 8-15 (not stored) +d r5 16-23 (not stored) +ld f2,f3 24-39 (not stored) +s r8,r9 40-55 (not stored) +gg f4 56-63 (not stored) +t (none) 64-79 (stored in parameter save area) +e (none) 80-87 (stored) +hh f5 88-95 (not stored)
If a prototype is not in scope: @@ -5980,12 +6160,24 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> are used in the remaining examples of parameter passing.
Parameter Passing Definitions - typedef struct { double a double b; } dpfp2; typedef - struct float a float b; } spfp2; double a1,a4; dpfp2 a2,a3 ; spfp - a6,a7; double func2 (double a, dpfp2 p1, dpfp p2, double b, int x); - double func3 (double a, dpfp2 p1, dpfp p2, double b, int x, spfp2 - p3,spfpp4); struct three_floats { float a,b,c;} struct two_floats { - float a,b;} + typedef struct { + double a + double b; +} dpfp2; + +typedef struct + float a + float b; +} spfp2; + +double a1,a4; +dpfp2 a2,a3 ; +spfp a6,a7; +double func2 (double a, dpfp2 p1, dpfp p2, double b, int x); +double func3 (double a, dpfp2 p1, dpfp p2, double b, int x, spfp2 p3,spfpp4); + +struct three_floats { float a,b,c;} +struct two_floats { float a,b;}
shows how parameters are @@ -5996,9 +6188,15 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Passing Homogeneous Floating-Point Aggregates and Integer Parameters in Registers without a Parameter Save Area - x = func2(a1,a2,a3,a4, 5); Parameter Register Offset - in parameter save area a1 f1 n/a a2.a f2 n/a a2.b f3 n/a a3.a f4 n/a - a3.b f5 n/a a4 f6 n/a 5 r9 n/a + x = func2(a1,a2,a3,a4, 5); +Parameter Register Offset in parameter save area +a1 f1 n/a +a2.a f2 n/a +a2.b f3 n/a +a3.a f4 n/a +a3.b f5 n/a +a4 f6 n/a +5 r9 n/a
shows how parameters are @@ -6008,10 +6206,19 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Passing Homogeneous Floating-Point Aggregates and Integer Parameters in Registers without a Parameter Save Area - x = func3(a1,a2,a3,a4, 5,a6,a7); Parameter Register - Offset in parameter save area a1 f1 n/a a2.a f2 n/a a2.b f3 n/a a3.a - f4 n/a a3.b f5 n/a a4 f6 n/a 5 r9 n/a a6.a f7 n/a a6.b f8 n/a a7.a f9 - n/a a7.b f10 n/a + x = func3(a1,a2,a3,a4, 5,a6,a7); +Parameter Register Offset in parameter save area +a1 f1 n/a +a2.a f2 n/a +a2.b f3 n/a +a3.a f4 n/a +a3.b f5 n/a +a4 f6 n/a +5 r9 n/a +a6.a f7 n/a +a6.b f8 n/a +a7.a f9 n/a +a7.b f10 n/a
shows how parameters are @@ -6022,15 +6229,25 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Passing Floating-Point Scalars and Homogeneous Floating-Point Aggregates in Registers and Memory - x = oddity (float d1, float d2, float d3, float d4, - float d5, float d6, float d7, float d8, float d9, float d10, float - d11, float d12, struct three_floats x) Parameter Register Offset in - parameter save area d1 f1 0 (not stored) d2 f2 8 (not stored) d3 f3 - 16 (not stored) d4 f4 24 (not stored) d5 f5 32 (not stored) d6 f6 40 - (not stored) d7 f7 48 (not stored) d8 f8 56 (not stored) d9 f9 64 - (not stored) d10 f10 72 (not stored) d11 f11 80 (not stored) d12 f12 - 88 (not stored) x.a f13 96 (store because of no partial DW rule) x.b - - 100 (stored) x.c - 104 (stored) + x = oddity (float d1, float d2, float d3, float d4, float d5, + float d6, float d7, float d8, float d9, float d10, + float d11, float d12, struct three_floats x) +Parameter Register Offset in parameter save area +d1 f1 0 (not stored) +d2 f2 8 (not stored) +d3 f3 16 (not stored) +d4 f4 24 (not stored) +d5 f5 32 (not stored) +d6 f6 40 (not stored) +d7 f7 48 (not stored) +d8 f8 56 (not stored) +d9 f9 64 (not stored) +d10 f10 72 (not stored) +d11 f11 80 (not stored) +d12 f12 88 (not stored) +x.a f13 96 (store because of no partial DW rule) +x.b - 100 (stored) +x.c - 104 (stored)
shows how parameters are @@ -6042,13 +6259,27 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Passing Floating-Point Scalars and Homogeneous Floating-Point Aggregates in FPRs and GPRs without a Parameter Save Area - x = oddity2 (struct two_floats s1, struct two_floats - s2, struct two_floats s3, struct two_floats s4, struct two_floats s5, - struct two_floats s6, struct two_floats s7, struct two_floats s8) - Parameter Register Offset in parameter save area s1.a f1 n/a s1.b f2 - n/a s2.a f3 n/a s2.b f4 n/a s3.a f5 n/a s3.b f6 n/a s4.a f7 n/a s4.b - f8 n/a s5.a f9 n/a s5.b f10 n/a s6.a f11 n/a s6.b f12 n/a s7.a f13 - n/a s7.b - n/a s7 gpr9 n/a s8 gpr10 n/a + x = oddity2 (struct two_floats s1, struct two_floats s2, + struct two_floats s3, struct two_floats s4, + struct two_floats s5, struct two_floats s6, + struct two_floats s7, struct two_floats s8) +Parameter Register Offset in parameter save area +s1.a f1 n/a +s1.b f2 n/a +s2.a f3 n/a +s2.b f4 n/a +s3.a f5 n/a +s3.b f6 n/a +s4.a f7 n/a +s4.b f8 n/a +s5.a f9 n/a +s5.b f10 n/a +s6.a f11 n/a +s6.b f12 n/a +s7.a f13 n/a +s7.b - n/a +s7 gpr9 n/a +s8 gpr10 n/a
shows how parameters are @@ -6061,17 +6292,27 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Passing Homogeneous Floating-Point Aggregates in FPRs, GPRs, and Memory with a Parameter Save Area - x = oddity3 (struct two_floats s1, struct two_floats - s2, struct two_floats s3, struct two_floats s4, struct two_floats s5, - struct two_floats s6, struct two_floats s7, struct two_floats s8, - struct two_floats s9) Parameter Register Offset in parameter save - area s1.a f1 0 (not stored) s1.b f2 4 (not stored) s2.a f3 8 (not - stored) s2.b f4 12 (not stored) s3.a f5 16 (not stored) s3.b f6 20 - (not stored) s4.a f7 24 (not stored) s4.b f8 28 (not stored) s5.a f9 - 32 (not stored) s5.b f10 36 (not stored) s6.a f11 40 (not stored) - s6.b f12 44 (not stored) s7.a f13 48 (not stored, SPFP in FPR) s7.b - - 52 (not stored) s7 gpr9 48 (not stored, full gpr) s8 gpr10 56 (not - stored, full gpr) s9 64 (stored) + x = oddity3 (struct two_floats s1, struct two_floats s2, struct two_floats s3, + struct two_floats s4, struct two_floats s5, struct two_floats s6, + struct two_floats s7, struct two_floats s8, struct two_floats s9) +Parameter Register Offset in parameter save area +s1.a f1 0 (not stored) +s1.b f2 4 (not stored) +s2.a f3 8 (not stored) +s2.b f4 12 (not stored) +s3.a f5 16 (not stored) +s3.b f6 20 (not stored) +s4.a f7 24 (not stored) +s4.b f8 28 (not stored) +s5.a f9 32 (not stored) +s5.b f10 36 (not stored) +s6.a f11 40 (not stored) +s6.b f12 44 (not stored) +s7.a f13 48 (not stored, SPFP in FPR) +s7.b - 52 (not stored) +s7 gpr9 48 (not stored, full gpr) +s8 gpr10 56 (not stored, full gpr) +s9 64 (stored)
shows how parameters are @@ -6079,10 +6320,14 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> FPRs. In this figure, a Parameter Save Area is not allocated.
Passing Vector Data Types without Parameter Save Area - x =func4(int s1, vector float s2, float s3, vector - int s4, vector char s5) Parameter Register Offset in parameter save - area s1 gpr3 n/a s2 v2 n/a s3 f1 n/a s4 v3 n/a s5 v4 - n/a + x =func4(int s1, vector float s2, float s3, vector int s4, + vector char s5) +Parameter Register Offset in parameter save area +s1 gpr3 n/a +s2 v2 n/a +s3 f1 n/a +s4 v3 n/a +s5 v4 n/a
shows how parameters are @@ -6090,11 +6335,15 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> FPRs. In this figure, a Parameter Save Area is allocated.
Passing Vector Data Types with a Parameter Save Area - x =func5(int s1, vector float s2, float s3, vector - int s4, int s5, char s6) Parameter Register Offset in parameter save - area s1 gpr3 0 (not stored) s2 v2 16 (not stored) s3 f1 32 (not - stored) s4 v3 48 (not stored) s5 - 64 (stored) s6 - 72 - (stored) + x =func5(int s1, vector float s2, float s3, vector int s4, + int s5, char s6) +Parameter Register Offset in parameter save area +s1 gpr3 0 (not stored) +s2 v2 16 (not stored) +s3 f1 32 (not stored) +s4 v3 48 (not stored) +s5 - 64 (stored) +s6 - 72 (stored)
When a function takes the address of at least one of its arguments, it is the callee's responsibility to store function @@ -6154,7 +6403,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Aggregates that are not returned by value are returned in a storage buffer provided by the caller. The address is provided as a hidden first input argument in general-purpose register r3. - + Quadword decimal floating-point return values shall be returned in the first paired floating-point register parameter pair; that is, @@ -6233,8 +6482,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> instructions: - lis r16, symbol@ha ld r12, symbol@l(r16) lis r16, - symbol2@ha addi r16, r16, symbol2@l lvx v1, r0, r16 + lis r16, symbol@ha +ld r12, symbol@l(r16) + +lis r16, symbol2@ha +addi r16, r16, symbol2@l +lvx v1, r0, r16 By instantiating the TOC pointer in r2 and using TOC-pointer @@ -6242,16 +6495,24 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> .) - <load TOC base to r2> ld r12, symbol@toc(r2) li - r16, symbol2@toc lvx v1, r2, r16 + <load TOC base to r2> +ld r12, symbol@toc(r2) + +li r16, symbol2@toc +lvx v1, r2, r16 By instantiating the TOC pointer in r2 and using GOT-indirect addressing: - <load TOC base to r2> ld r12, symbol@got(r2) ld - r12, 0(r12) ld r12, symbol2@got(r2) lvx v1, 0, r12 + <load TOC base to r2> + +ld r12, symbol@got(r2) +ld r12, 0(r12) + +ld r12, symbol2@got(r2) +lvx v1, 0, r12 In the OpenPOWER ELF V2 ABI, position-dependent code built with this addressing scheme may have a Global Offset Table (GOT) in the data segment that holds addresses. (For more information, see @@ -6325,8 +6586,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> relative addressing (for private data). - <load TOC base to r2> ld r12, symbol@toc(r2) li - r16, symbol2@toc lvx v1, r2, r16 + <load TOC base to r2> + +ld r12, symbol@toc(r2) + +li r16, symbol2@toc +lvx v1, r2, r16 By instantiating the TOC pointer in r2 and using GOT-indirect @@ -6334,8 +6599,14 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> sections): - <load TOC base to r2> ld r12, symbol@got(r2) ld - r12, 0(r12) ld r12 symbol2@got(r2) lvx v1, 0, r12 + <load TOC base to r2> + +ld r12, symbol@got(r2) + +ld r12, 0(r12) + +ld r12 symbol2@got(r2) +lvx v1, 0, r12 Position-independent executables or shared objects have a GOT in the data segment that holds addresses. When the system creates a memory image from the file, the GOT entries are updated to reflect the @@ -6432,11 +6703,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> For example, while linking a static module with a known load address in the first 2 GB of the address space, the following code sequence may be rewritten: - addis r2, r12, .TOC.-func@ha addi r2, r2, - .TOC.-func@l + addis r2, r12, .TOC.-func@ha +addi r2, r2, .TOC.-func@l It may be rewritten by a linker or assembler to an equivalent form that is faster due to instruction fusion, such as: - lis r2, .TOC.@ha addi r2, r2, .TOC.@l + lis r2, .TOC.@ha +addi r2, r2, .TOC.@l In addition to establishing addressability, the function prologue is responsible for the following functions: @@ -6461,21 +6733,28 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> mfocrf results with an OR instruction; for example, to yield a word in r0 including all three preserved CRs as follows: - mfocrf r0, crf2 mfocrf r1, crf3 or r0, r0, r1 mfocrf - r1, crf4 or r0, r0, r1 + mfocrf r0, crf2 +mfocrf r1, crf3 +or r0, r0, r1 +mfocrf r1, crf4 +or r0, r0, r1 Specifically, this allows each OpenPOWER-compliant processor implementation to set each field to hold either 0 or the correct in-order value of the corresponding CR field at the point where the mfocrf instruction is performed. +   Assembly Language Syntax for Defining Entry Points When a function has two entry points, the global entry point is defined as a symbol. The local entry point is defined with the .localentry assembler pseudo op. - my_func: addis r2, r12, (.TOC.-my_func)@ha addi r2, r2, - (.TOC.-my_func)@l .localentry my_func, .-my_func ... ; function - definition blr + my_func: + addis r2, r12, (.TOC.-my_func)@ha + addi r2, r2, (.TOC.-my_func)@l + .localentry my_func, .-my_func + ... ; function definition + blr shows how to represent dual entry points in symbol tables in an ELF object file. It also defines @@ -6511,47 +6790,53 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Before a function calls any other function, it shall establish its own stack frame, whose size shall be a multiple of 16 bytes. - - - In instances where a function's prologue creates a stack - frame, the back-chain word of the stack frame shall be updated - atomically with the value of the stack pointer (r1) when a back - chain is implemented. (This must be supported as default by all ELF - V2 ABI-compliant environments.) This task can be done by using one - of the following Store Doubleword with Update instructions: - - - Store Doubleword with Update instruction with relevant - negative displacement for stack frames that are smaller than 32 - KB - - - Store Doubleword with Update Indexed instruction where the - negative size of the stack frame has been computed, using - addis and - addi or - ori instructions, and then loaded into a - volatile register, for stack frames that are 32 KB or - greater - - - The function shall save the link register that contains its - return address in the LR save doubleword of its caller's stack - frame before calling another function. + + + In instances where a function's prologue creates a stack + frame, the back-chain word of the stack frame shall be updated + atomically with the value of the stack pointer (r1) when a back + chain is implemented. (This must be supported as default by all ELF + V2 ABI-compliant environments.) This task can be done by using one + of the following Store Doubleword with Update instructions: + + + Store Doubleword with Update instruction with relevant + negative displacement for stack frames that are smaller than 32 + KB + + + Store Doubleword with Update Indexed instruction where the + negative size of the stack frame has been computed, using + addis and + addi or + ori instructions, and then loaded into a + volatile register, for stack frames that are 32 KB or + greater + + + + + The function shall save the link register that contains its + return address in the LR save doubleword of its caller's stack + frame before calling another function. + + The deallocation of a function's stack frame must be an atomic operation. This task can be accomplished by one of the following methods: - - - Increment the stack pointer by the identical value that it - was originally decremented by in the prologue when the stack frame - was created. - - - Load the stack pointer (r1) with the value in the back-chain - word in the stack frame, if a back chain is present. + + + Increment the stack pointer by the identical value that it + was originally decremented by in the prologue when the stack frame + was created. + + + Load the stack pointer (r1) with the value in the back-chain + word in the stack frame, if a back chain is present. + + The calling sequence does not restrict how languages leverage @@ -6625,26 +6910,53 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> A sample implementation of _savegpr0_ N and _restgpr0_ N follows: - _ - savegpr0_14: std r14,-144(r1) _savegpr0_15: std - r15,-136(r1) _savegpr0_16: std r16,-128(r1) _savegpr0_17: std - r17,-120(r1) _savegpr0_18: std r18,-112(r1) _savegpr0_19: std - r19,-104(r1) _savegpr0_20: std r20,-96(r1) _savegpr0_21: std - r21,-88(r1) _savegpr0_22: std r22,-80(r1) _savegpr0_23: std r23,-72(r1) - _savegpr0_24: std r24,-64(r1) _savegpr0_25: std r25,-56(r1) - _savegpr0_26: std r26,-48(r1) _savegpr0_27: std r27,-40(r1) - _savegpr0_28: std r28,-32(r1) _savegpr0_29: std r29,-24(r1) - _savegpr0_30: std r30,-16(r1) _savegpr0_31: std r31,-8(r1) std r0, - 16(r1) blr _restgpr0_14: ld r14,-144(r1) _restgpr0_15: ld r15,-136(r1) - _restgpr0_16: ld r16,-128(r1) _restgpr0_17: ld r17,-120(r1) - _restgpr0_18: ld r18,-112(r1) _restgpr0_19: ld r19,-104(r1) - _restgpr0_20: ld r20,-96(r1) _restgpr0_21: ld r21,-88(r1) _restgpr0_22: - ld r22,-80(r1) _restgpr0_23: ld r23,-72(r1) _restgpr0_24: ld - r24,-64(r1) _restgpr0_25: ld r25,-56(r1) _restgpr0_26: ld r26,-48(r1) - _restgpr0_27: ld r27,-40(r1) _restgpr0_28: ld r28,-32(r1) _restgpr0_29: - ld r0, 16(r1) ld r29,-24(r1) mtlr r0 ld r30,-16(r1) ld r31,-8(r1) blr - _restgpr0_30: ld r30,-16(r1) _restgpr0_31: ld r0, 16(r1) ld r31,-8(r1) - mtlr r0 blr + _savegpr0_14: std r14,-144(r1) + _savegpr0_15: std r15,-136(r1) + _savegpr0_16: std r16,-128(r1) + _savegpr0_17: std r17,-120(r1) + _savegpr0_18: std r18,-112(r1) + _savegpr0_19: std r19,-104(r1) + _savegpr0_20: std r20,-96(r1) + _savegpr0_21: std r21,-88(r1) + _savegpr0_22: std r22,-80(r1) + _savegpr0_23: std r23,-72(r1) + _savegpr0_24: std r24,-64(r1) + _savegpr0_25: std r25,-56(r1) + _savegpr0_26: std r26,-48(r1) + _savegpr0_27: std r27,-40(r1) + _savegpr0_28: std r28,-32(r1) + _savegpr0_29: std r29,-24(r1) + _savegpr0_30: std r30,-16(r1) + _savegpr0_31: std r31,-8(r1) + std r0, 16(r1) + blr + + _restgpr0_14: ld r14,-144(r1) + _restgpr0_15: ld r15,-136(r1) + _restgpr0_16: ld r16,-128(r1) + _restgpr0_17: ld r17,-120(r1) + _restgpr0_18: ld r18,-112(r1) + _restgpr0_19: ld r19,-104(r1) + _restgpr0_20: ld r20,-96(r1) + _restgpr0_21: ld r21,-88(r1) + _restgpr0_22: ld r22,-80(r1) + _restgpr0_23: ld r23,-72(r1) + _restgpr0_24: ld r24,-64(r1) + _restgpr0_25: ld r25,-56(r1) + _restgpr0_26: ld r26,-48(r1) + _restgpr0_27: ld r27,-40(r1) + _restgpr0_28: ld r28,-32(r1) + _restgpr0_29: ld r0, 16(r1) + ld r29,-24(r1) + mtlr r0 + ld r30,-16(r1) + ld r31,-8(r1) + blr + _restgpr0_30: ld r30,-16(r1) + _restgpr0_31: ld r0, 16(r1) + ld r31,-8(r1) + mtlr r0 + blr Each _savegpr1_N routine saves the general registers from rN - r31, inclusive. When the routine is called, r12 contains the address of the word just beyond the end of the general register save area. @@ -6654,25 +6966,45 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> normal use of r12 on a call. A sample implementation of _savegpr1_N and _restgpr1_N follows: - _savegpr1_14: std r14,-144(r12) _savegpr1_15: std - r15,-136(r12) _savegpr1_16: std r16,-128(r12) _savegpr1_17: std - r17,-120(r12) _savegpr1_18: std r18,-112(r12) _savegpr1_19: std - r19,-104(r12) _savegpr1_20: std r20,-96(r12) _savegpr1_21: std - r21,-88(r12) _savegpr1_22: std r22,-80(r12) _savegpr1_23: std - r23,-72(r12) _savegpr1_24: std r24,-64(r12) _savegpr1_25: std - r25,-56(r12) _savegpr1_26: std r26,-48(r12) _savegpr1_27: std - r27,-40(r12) _savegpr1_28: std r28,-32(r12) _savegpr1_29: std - r29,-24(r12) _savegpr1_30: std r30,-16(r12) _savegpr1_31: std - r31,-8(r12) blr _restgpr1_14: ld r14,-144(r12) _restgpr1_15: ld - r15,-136(r12) _restgpr1_16: ld r16,-128(r12) _restgpr1_17: ld - r17,-120(r12) _restgpr1_18: ld r18,-112(r12) _restgpr1_19: ld - r19,-104(r12) _restgpr1_20: ld r20,-96(r12) _restgpr1_21: ld - r21,-88(r12) _restgpr1_22: ld r22,-80(r12) _restgpr1_23: ld - r23,-72(r12) _restgpr1_24: ld r24,-64(r12) _restgpr1_25: ld - r25,-56(r12) _restgpr1_26: ld r26,-48(r12) _restgpr1_27: ld - r27,-40(r12) _restgpr1_28: ld r28,-32(r12) _restgpr1_29: ld - r29,-24(r12) _restgpr1_30: ld r30,-16(r12) _restgpr1_31: ld r31,-8(r12) - blr + _savegpr1_14: std r14,-144(r12) + _savegpr1_15: std r15,-136(r12) + _savegpr1_16: std r16,-128(r12) + _savegpr1_17: std r17,-120(r12) + _savegpr1_18: std r18,-112(r12) + _savegpr1_19: std r19,-104(r12) + _savegpr1_20: std r20,-96(r12) + _savegpr1_21: std r21,-88(r12) + _savegpr1_22: std r22,-80(r12) + _savegpr1_23: std r23,-72(r12) + _savegpr1_24: std r24,-64(r12) + _savegpr1_25: std r25,-56(r12) + _savegpr1_26: std r26,-48(r12) + _savegpr1_27: std r27,-40(r12) + _savegpr1_28: std r28,-32(r12) + _savegpr1_29: std r29,-24(r12) + _savegpr1_30: std r30,-16(r12) + _savegpr1_31: std r31,-8(r12) + blr + + _restgpr1_14: ld r14,-144(r12) + _restgpr1_15: ld r15,-136(r12) + _restgpr1_16: ld r16,-128(r12) + _restgpr1_17: ld r17,-120(r12) + _restgpr1_18: ld r18,-112(r12) + _restgpr1_19: ld r19,-104(r12) + _restgpr1_20: ld r20,-96(r12) + _restgpr1_21: ld r21,-88(r12) + _restgpr1_22: ld r22,-80(r12) + _restgpr1_23: ld r23,-72(r12) + _restgpr1_24: ld r24,-64(r12) + _restgpr1_25: ld r25,-56(r12) + _restgpr1_26: ld r26,-48(r12) + _restgpr1_27: ld r27,-40(r12) + _restgpr1_28: ld r28,-32(r12) + _restgpr1_29: ld r29,-24(r12) + _restgpr1_30: ld r30,-16(r12) + _restgpr1_31: ld r31,-8(r12) + blr
FPR Save and Restore Functions @@ -6697,25 +7029,54 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> A sample implementation of _savefpr_ N and _restfpr_ N follows: - _savefpr_14: stfd f14,-144(r1) _savefpr_15: stfd - f15,-136(r1) _savefpr_16: stfd f16,-128(r1) _savefpr_17: stfd - f17,-120(r1) _savefpr_18: stfd f18,-112(r1) _savefpr_19: stfd - f19,-104(r1) _savefpr_20: stfd f20,-96(r1) _savefpr_21: stfd - f21,-88(r1) _savefpr_22: stfd f22,-80(r1) _savefpr_23: stfd f23,-72(r1) - _savefpr_24: stfd f24,-64(r1) _savefpr_25: stfd f25,-56(r1) - _savefpr_26: stfd f26,-48(r1) _savefpr_27: stfd f27,-40(r1) - _savefpr_28: stfd f28,-32(r1) _savefpr_29: stfd f29,-24(r1) - _savefpr_30: stfd f30,-16(r1) _savefpr_31: stfd f31,-8(r1) std r0, - 16(r1) blr _restfpr_14: lfd f14,-144(r1) _restfpr_15: lfd f15,-136(r1) - _restfpr_16: lfd f16,-128(r1) _restfpr_17: lfd f17,-120(r1) - _restfpr_18: lfd f18,-112(r1) _restfpr_19: lfd f19,-104(r1) - _restfpr_20: lfd f20,-96(r1) _restfpr_21: lfd f21,-88(r1) _restfpr_22: - lfd f22,-80(r1) _restfpr_23: lfd f23,-72(r1) _restfpr_24: lfd - f24,-64(r1) _restfpr_25: lfd f25,-56(r1) _restfpr_26: lfd f26,-48(r1) - _restfpr_27: lfd f27,-40(r1) _restfpr_28: lfd f28,-32(r1) _restfpr_29: - ld r0, 16(r1) lfd f29,-24(r1) mtlr r0 lfd f30,-16(r1) lfd f31,-8(r1) - blr _restfpr_30: lfd f30,-16(r1) _restfpr_31: ld r0, 16(r1) lfd - f31,-8(r1) mtlr r0 blr + _savefpr_14: stfd f14,-144(r1) + _savefpr_15: stfd f15,-136(r1) + _savefpr_16: stfd f16,-128(r1) + _savefpr_17: stfd f17,-120(r1) + _savefpr_18: stfd f18,-112(r1) + _savefpr_19: stfd f19,-104(r1) + _savefpr_20: stfd f20,-96(r1) + _savefpr_21: stfd f21,-88(r1) + _savefpr_22: stfd f22,-80(r1) + _savefpr_23: stfd f23,-72(r1) + _savefpr_24: stfd f24,-64(r1) + _savefpr_25: stfd f25,-56(r1) + _savefpr_26: stfd f26,-48(r1) + _savefpr_27: stfd f27,-40(r1) + _savefpr_28: stfd f28,-32(r1) + _savefpr_29: stfd f29,-24(r1) + _savefpr_30: stfd f30,-16(r1) + _savefpr_31: stfd f31,-8(r1) + std r0, 16(r1) + blr + + _restfpr_14: lfd f14,-144(r1) + _restfpr_15: lfd f15,-136(r1) + _restfpr_16: lfd f16,-128(r1) + _restfpr_17: lfd f17,-120(r1) + _restfpr_18: lfd f18,-112(r1) + _restfpr_19: lfd f19,-104(r1) + _restfpr_20: lfd f20,-96(r1) + _restfpr_21: lfd f21,-88(r1) + _restfpr_22: lfd f22,-80(r1) + _restfpr_23: lfd f23,-72(r1) + _restfpr_24: lfd f24,-64(r1) + _restfpr_25: lfd f25,-56(r1) + _restfpr_26: lfd f26,-48(r1) + _restfpr_27: lfd f27,-40(r1) + _restfpr_28: lfd f28,-32(r1) + _restfpr_29: ld r0, 16(r1) + lfd f29,-24(r1) + mtlr r0 + lfd f30,-16(r1) + lfd f31,-8(r1) + blr + + _restfpr_30: lfd f30,-16(r1) + _restfpr_31: ld r0, 16(r1) + lfd f31,-8(r1) + mtlr r0 + blr
Vector Save and Restore Functions @@ -6741,27 +7102,58 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> _restgpr1_M. A sample implementation of _savevr_M and _restvr_M follows: - _savevr_20: addi r12,r0,-192 stvx v20,r12,r0 # save v20 - _savevr_21: addi r12,r0,-176 stvx v21,r12,r0 # save v21 _savevr_22: - addi r12,r0,-160 stvx v22,r12,r0 # save v22 _savevr_23: addi - r12,r0,-144 stvx v23,r12,r0 # save v23 _savevr_24: addi r12,r0,-128 - stvx v24,r12,r0 # save v24 _savevr_25: addi r12,r0,-112 stvx v25,r12,r0 - # save v25 _savevr_26: addi r12,r0,-96 stvx v26,r12,r0 # save v26 - _savevr_27: addi r12,r0,-80 stvx v27,r12,r0 # save v27 _savevr_28: addi - r12,r0,-64 stvx v28,r12,r0 # save v28 _savevr_29: addi r12,r0,-48 stvx - v29,r12,r0 # save v29 _savevr_30: addi r12,r0,-32 stvx v30,r12,r0 # - save v30 _savevr_31: addi r12,r0,-16 stvx v31,r12,r0 # save v31 blr # - return to epilogue _restvr_20: addi r12,r0,-192 lvx v20,r12,r0 # - restore v20 _restvr_21: addi r12,r0,-176 lvx v21,r12,r0 # restore v21 - _restvr_22: addi r12,r0,-160 lvx v22,r12,r0 # restore v22 _restvr_23: - addi r12,r0,-144 lvx v23,r12,r0 # restore v23 _restvr_24: addi - r12,r0,-128 lvx v24,r12,r0 # restore v24 _restvr_25: addi r12,r0,-112 - lvx v25,r12,r0 # restore v25 _restvr_26: addi r12,r0,-96 lvx v26,r12,r0 - # restore v26 _restvr_27: addi r12,r0,-80 lvx v27,r12,r0 # restore v27 - _restvr_28: addi r12,r0,-64 lvx v28,r12,r0 # restore v28 _restvr_29: - addi r12,r0,-48 lvx v29,r12,r0 # restore v29 _restvr_30: addi - r12,r0,-32 lvx v30,r12,r0 # restore v30 _restvr_31: addi r12,r0,-16 lvx - v31,r12,r0 # restore v31 blr #return to epilogue + +_savevr_20: addi r12,r0,-192 + stvx v20,r12,r0 # save v20 +_savevr_21: addi r12,r0,-176 + stvx v21,r12,r0 # save v21 +_savevr_22: addi r12,r0,-160 + stvx v22,r12,r0 # save v22 +_savevr_23: addi r12,r0,-144 + stvx v23,r12,r0 # save v23 +_savevr_24: addi r12,r0,-128 + stvx v24,r12,r0 # save v24 +_savevr_25: addi r12,r0,-112 + stvx v25,r12,r0 # save v25 +_savevr_26: addi r12,r0,-96 + stvx v26,r12,r0 # save v26 +_savevr_27: addi r12,r0,-80 + stvx v27,r12,r0 # save v27 +_savevr_28: addi r12,r0,-64 + stvx v28,r12,r0 # save v28 +_savevr_29: addi r12,r0,-48 + stvx v29,r12,r0 # save v29 +_savevr_30: addi r12,r0,-32 + stvx v30,r12,r0 # save v30 +_savevr_31: addi r12,r0,-16 + stvx v31,r12,r0 # save v31 + blr # return to epilogue + +_restvr_20: addi r12,r0,-192 + lvx v20,r12,r0 # restore v20 +_restvr_21: addi r12,r0,-176 + lvx v21,r12,r0 # restore v21 +_restvr_22: addi r12,r0,-160 + lvx v22,r12,r0 # restore v22 +_restvr_23: addi r12,r0,-144 + lvx v23,r12,r0 # restore v23 +_restvr_24: addi r12,r0,-128 + lvx v24,r12,r0 # restore v24 +_restvr_25: addi r12,r0,-112 + lvx v25,r12,r0 # restore v25 +_restvr_26: addi r12,r0,-96 + lvx v26,r12,r0 # restore v26 +_restvr_27: addi r12,r0,-80 + lvx v27,r12,r0 # restore v27 +_restvr_28: addi r12,r0,-64 + lvx v28,r12,r0 # restore v28 +_restvr_29: addi r12,r0,-48 + lvx v29,r12,r0 # restore v29 +_restvr_30: addi r12,r0,-32 + lvx v30,r12,r0 # restore v30 +_restvr_31: addi r12,r0,-16 + lvx v31,r12,r0 # restore v31 + blr #return to epilogue
@@ -6850,7 +7242,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - + Absolute Load and Store Example @@ -6873,21 +7265,48 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - extern int src; extern int dst; extern int - *ptr; dst = src; ptr = &dst; *ptr = src; + extern int src; +extern int dst; +extern int *ptr; + +dst = src; + + + +ptr = &dst; + + + +*ptr = src; + + + + - .extern src .extern dst .extern ptr .section - ".text" lis r9,src@ha lwz r9,src@l(r9) lis r11,dst@ha stw - r9,dst@l(r11) lis r11,ptr@ha lis r9,dst@ha la r9,dst@l(r9) std - r9,ptr@l(r11) lis r11,ptr@ha lwz r11,ptr@l(r11) lis r9,src@ha - lwz r9,src@l(r9) stw r9,0(r11) + .extern src +.extern dst +.extern ptr +.section ".text" +lis r9,src@ha +lwz r9,src@l(r9) +lis r11,dst@ha +stw r9,dst@l(r11) +lis r11,ptr@ha +lis r9,dst@ha +la r9,dst@l(r9) +std r9,ptr@l(r11) +lis r11,ptr@ha +lwz r11,ptr@l(r11) +lis r9,src@ha +lwz r9,src@l(r9) +stw r9,0(r11)
- + Small Model Position-Independent Load and Store (DSO) @@ -6910,21 +7329,48 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - extern int src; extern int dst; extern int - *ptr; dst = src; ptr = &dst; *ptr = src; + extern int src; +extern int dst; +extern int *ptr; + + +dst = src; + + + +ptr = &dst; + + +*ptr = src; + + + + - .extern src .extern dst .extern ptr .section - ".text" # TOC base in r2 ld r9,src@got(2) lwz r0,0(r9) ld - r9,dst@got(r2) stw r0,0(r9) ld r9,ptr@got(r2) ld r0,dst@got(r2) - std r0,0(r9) ld r9,ptr@got(r2) ld r11,0(r9) ld r9,src@got(r2) - lwz r0,0(r9) stw r0,0(r11) + .extern src +.extern dst +.extern ptr +.section ".text" +# TOC base in r2 +ld r9,src@got(2) +lwz r0,0(r9) +ld r9,dst@got(r2) +stw r0,0(r9) +ld r9,ptr@got(r2) +ld r0,dst@got(r2) +std r0,0(r9) +ld r9,ptr@got(r2) +ld r11,0(r9) +ld r9,src@got(r2) +lwz r0,0(r9) +stw r0,0(r11)
- + Medium or Large Model Position-Independent Load and Store (DSO) @@ -6948,18 +7394,54 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - extern int src; extern int dst; int *ptr; dst = - src; ptr = & dst; *ptr = src; + extern int src; +extern int dst; +int *ptr; + + +dst = src; + + + + + +ptr = &dst; + + + + +*ptr = src; + + + + + + - .extern src .extern dst .extern ptr .section - ".text" # AssumesTOC pointer in r2 addis r6,r2,src@got@ha ld - r6,src@got@l(r6) addis r7,r2,dst@got@ha ld r7,dst@got@l(r7) lwz - r0,0(r6) stw r0,0(r7) addis r6,r2,dst@got@ha ld - r6,dst@got@l(r6) addis r7,r2,ptr@got@ha ld r7,ptr@got@l(r7) stw - r6,0(r7) addis r6,r2,src@got@ha ld r6,src@got@l(r6) addis - r7,r2,ptr@got@ha ld r7,ptr@got@l(r7) ld r7,0(r7) lwz r0,0(r6) - stw r0,0,(r7) + .extern src +.extern dst +.extern ptr +.section".text" +# AssumesTOC pointer in r2 +addis r6,r2,src@got@ha +ld r6,src@got@l(r6) +addis r7,r2,dst@got@ha +ld r7,dst@got@l(r7) +lwz r0,0(r6) +stw r0,0(r7) +addis r6,r2,dst@got@ha +ld r6,dst@got@l(r6) +addis r7,r2,ptr@got@ha +ld r7,ptr@got@l(r7) +stw r6,0(r7) +addis r6,r2,src@got@ha +ld r6,src@got@l(r6) +addis r7,r2,ptr@got@ha +ld r7,ptr@got@l(r7) +ld r7,0(r7) +lwz r0,0(r6) +stw r0,0,(r7) @@ -6989,66 +7471,66 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Global Offset Table where the value of the symbol is stored is given by the assembly syntax symbol@got. This syntax represents the address of the variable named "symbol." + The offset for this assembly syntax cannot be any larger than 16 + bits. In cases where the offset is greater than 16 bits, the following + assembly syntax is used for offsets up to 32 bits: + + + High (32-bit) adjusted part of the offset: + symbol@got@ha + Causes a linker error if the offset is larger than 32 + bits. + + + High (32-bit) part of the offset: symbol@got@h + Causes a linker error if the offset is larger than 32 + bits. + + + Low part of the offset: symbol@got@l + + + To obtain the multiple 16-bit segments of a 64-bit offset, the + following operators may be used: + + + Highest (most-significant 16 bits) adjusted part of the + offset: symbol@highesta + + + Highest (most-significant 16 bits) part of the offset: + symbol@highest + + + Higher (next significant 16 bits) adjusted part of the + offset: symbol@highera + + + Higher (next significant 16 bits) part of the offset: + symbol@higher + + + High (next significant 16 bits) adjusted part of the offset: + symbol@higha + + + High (next significant 16 bits) part of the offset: + symbol@high + + + Low part of the offset: symbol@l + + + If the instruction using symbol@got@ + l has a signed immediate operand (for example, + addi), use symbol@got@ + ha(high adjusted) for the high part of the offset. + If it has an unsigned immediate operand (for example, ori), use + symbol@got@ + h. For a description of high-adjusted values, see + . - The offset for this assembly syntax cannot be any larger than 16 - bits. In cases where the offset is greater than 16 bits, the following - assembly syntax is used for offsets up to 32 bits: - - - High (32-bit) adjusted part of the offset: - symbol@got@ha - Causes a linker error if the offset is larger than 32 - bits. - - - High (32-bit) part of the offset: symbol@got@h - Causes a linker error if the offset is larger than 32 - bits. - - - Low part of the offset: symbol@got@l - - - To obtain the multiple 16-bit segments of a 64-bit offset, the - following operators may be used: - - - Highest (most-significant 16 bits) adjusted part of the - offset: symbol@highesta - - - Highest (most-significant 16 bits) part of the offset: - symbol@highest - - - Higher (next significant 16 bits) adjusted part of the - offset: symbol@highera - - - Higher (next significant 16 bits) part of the offset: - symbol@higher - - - High (next significant 16 bits) adjusted part of the offset: - symbol@higha - - - High (next significant 16 bits) part of the offset: - symbol@high - - - Low part of the offset: symbol@l - - - If the instruction using symbol@got@ - l has a signed immediate operand (for example, - addi), use symbol@got@ - ha(high adjusted) for the high part of the offset. - If it has an unsigned immediate operand (for example, ori), use - symbol@got@ - h. For a description of high-adjusted values, see - .
@@ -7099,10 +7581,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> extern void function(); - function(); +function(); - bl function nop + +bl function +nop @@ -7125,7 +7609,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> . The ELF V2 ABI requires the address of the called function to be in r12 when a cross-module function call is made. - +
Indirect Function Call (Absolute Medium Model) @@ -7148,14 +7632,30 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - extern void function(); extern void (*ptrfunc) - (); (*ptrfunc)(); + extern void function(); +extern void (*ptrfunc) (); + +ptrfunc = function; + + + +(*ptrfunc)(); + + + - .section .text lis r11,ptrfunc@ha lis - r9,function@ha ld r9,function@l(r9) std r9,ptrfunc@l(r11) lis - r12,ptrfunc@ha ld r12,ptrfunc@l(r12) mtctr r12 - bctrl + + +.section .text +lis r11,ptrfunc@ha +lis r9,function@ha +ld r9,function@l(r9) +std r9,ptrfunc@l(r11) +lis r12,ptrfunc@ha +ld r12,ptrfunc@l(r12) +mtctr r12 +bctrl @@ -7164,7 +7664,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> shows how to make an indirect function call using small-model position-independent code. - +
Small-Model Position-Independent Indirect Function Call @@ -7187,14 +7687,36 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - extern void function(); extern void (*ptrfunc) - (); ptrfunc = function; ... (*ptrfunc) (); + extern void function(); +extern void (*ptrfunc) (); + + +ptrfunc = function; + + +... +(*ptrfunc) (); + + + + + - .section .text /* TOC pointer is in r2 */ ld - r9,ptrfunc@got(r2) ld r0,function@got(r2) std r0,0(r9) ... ld - r9,ptrfunc@got(r2) ld r12,0(r9) mtctr r12 std r2,24(r1) bctrl - ld r2,24(r1) + + +.section .text +/* TOC pointer is in r2 */ +ld r9,ptrfunc@got(r2) +ld r0,function@got(r2) +std r0,0(r9) +... +ld r9,ptrfunc@got(r2) +ld r12,0(r9) +mtctr r12 +std r2,24(r1) +bctrl +ld r2,24(r1) @@ -7203,7 +7725,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> shows how to make an indirect function call using large-model position-independent code. - +
Large-Model Position-Independent Indirect Function Call @@ -7226,15 +7748,38 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - extern void function(); extern void (*ptrfunc) - (); ptrfunc=function; (*ptrfunc) (); + extern void function(); +extern void (*ptrfunc) (); +ptrfunc=function; + + + + + +(*ptrfunc) (); + + + + + + - addis r9,r2,ptrfunc@got@ha ld - r9,ptrfunc@got@l(r9) addis r12,r2,function@got@ha ld - r12,function@got@l(r12) std r12,0(r9) addis - r9,r2,ptrfunc@got@ha ld r9,ptrfunc@got@l(r9) ld r12,0(r9) std - r2,24(r1) mtctr r12 bctrl ld r2,24(r1) + + +addis r9,r2,ptrfunc@got@ha +ld r9,ptrfunc@got@l(r9) +addis r12,r2,function@got@ha +ld r12,function@got@l(r12) +std r12,0(r9) + +addis r9,r2,ptrfunc@got@ha +ld r9,ptrfunc@got@l(r9) +ld r12,0(r9) +std r2,24(r1) +mtctr r12 +bctrl +ld r2,24(r1) @@ -7256,8 +7801,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> with an R_PPC64_TOCSAVE relocation that points to a nop provided in the caller's prologue. In that case, the stub code can omit the r2 save. Instead, the linker replaces the prologue nop with an r2 save. - tocsaveloc: nop ... bl target .reloc ., R_PPC64_TOCSAVE, - tocsaveloc nop + tocsaveloc: + nop + ... +bl target + .reloc ., R_PPC64_TOCSAVE, tocsaveloc + nopThe linker may assume that r2 is valid at the point of a call. Thus, stub code may use r2 to load an address from the PLT unless the call is marked with an R_PPC64_REL24_NOTOC relocation to indicate that r2 @@ -7279,7 +7828,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> shows the model for branch instructions. - +
Branch Instruction Model @@ -7302,10 +7851,14 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - label: ... goto label; + label: +... +goto label; - .L01: ... b .L01 + .L01: +... +b .L01 @@ -7333,7 +7886,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> For position-dependent code (for example, the main module of an application) loaded into the low or high address range, absolute addressing of a branch table yields the best performance. - +
Absolute Switch Code (Within)for static modules located in low or high 2 GB of address space @@ -7357,14 +7910,36 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - switch(j) { case 0: ... case 1: ... case 3: ... - default: ... } + switch(j) +{ +case 0: +... +case 1: +... +case 3: +... +default: +... +} + + + - cmplwi r12, 4 bge .Ldefault slwi r12, 2 addis - r12, r12, .Ltab@ha lwa r12, .Ltab@l(r12) mtctr r12 bctr .rodata - .Ltab: .long .Lcase0 .long .Lcase1 .long .Ldefault .long - .Lcase3 .text + cmplwi r12, 4 +bge .Ldefault +slwi r12, 2 +addis r12, r12, .Ltab@ha +lwa r12, .Ltab@l(r12) +mtctr r12 +bctr +.rodata +.Ltab: +.long .Lcase0 +.long .Lcase1 +.long .Ldefault +.long .Lcase3 +.text @@ -7398,14 +7973,36 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - switch(j) { case 0: ... case 1: ... case 3: ... - default: ... } + switch(j) +{ +case 0: +... +case 1: +... +case 3: +... +default: +... +} + + + - cmplwi r12, 4 bge .Ldefault slwi r12, 2 addis - r12, r12, .Ltab@ha ld r12, .Ltab@l(r12) mtctr r12 bctr .rodata - .Ltab: .quad .Lcase0 .quad .Lcase1 .quad .Ldefault .quad - .Lcase3 .text + cmplwi r12, 4 +bge .Ldefault +slwi r12, 2 +addis r12, r12, .Ltab@ha +ld r12, .Ltab@l(r12) +mtctr r12 +bctr +.rodata +.Ltab: +.quad .Lcase0 +.quad .Lcase1 +.quad .Ldefault +.quad .Lcase3 +.text @@ -7417,7 +8014,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> pointer points to a fixed offset from the code segment. The use of relative offsets from the start address of the branch table ensures position-independence when code is loaded at different addresses. - +
Position-Independent Switch Code for Small/Medium Models @@ -7440,15 +8037,36 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - switch(j) { case 0: ... case 1: ... case 3: ... - default: ... } + switch(j) +{ +case 0: +... +case 1: +... +case 3: +... +default: +... +} + + + - cmplwi r12, 4 bge .Ldefault addis - r10,r2,(.Ltab-.TOC.)@ha addi r10,r10,(.Ltab-.TOC.)@l slwi r12,2 - lwax r8,r10,r12 add r10,r8,r10 mtctr r10 bctr .Ltab: .word - (.Lcase0-.Ltab) .word (.Lcase1-.Ltab) .word (.Ldefault-.Ltab) - .word (.Lcase3-.Ltab) + cmplwi r12, 4 +bge .Ldefault +addis r10,r2,(.Ltab-.TOC.)@ha +addi r10,r10,(.Ltab-.TOC.)@l +slwi r12,2 +lwax r8,r10,r12 +add r10,r8,r10 +mtctr r10 +bctr +.Ltab: +.word (.Lcase0-.Ltab) +.word (.Lcase1-.Ltab) +.word (.Ldefault-.Ltab) +.word (.Lcase3-.Ltab) @@ -7461,7 +8079,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> GB. The use of relative offsets from the start address of the branch table ensures position independence when code is loaded at different addresses. - +
Position-Independent Switch Code for All Models (alternate, with GOT-indirect addressing) @@ -7485,15 +8103,36 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - switch(j) { case 0: ... case 1: ... case 3: ... - default: ... } + switch(j) +{ +case 0: +... +case 1: +... +case 3: +... +default: +... +} + + + - cmplwi r12, 4 bge .Ldefault addis - r10,r2,.Ltab@got@ha ld r10,.Ltab@got@l(r10) slwi r12,2 lwax - r8,r10,r8 add r10,r8,r12 mtctr r10 bctr .Ltab: .word - (.Lcase0-.Ltab) .word (.Lcase1-.Ltab) .word (.Ldefault-.Ltab) - .word (.Lcase3-.Ltab) + cmplwi r12, 4 +bge .Ldefault +addis r10,r2,.Ltab@got@ha +ld r10,.Ltab@got@l(r10) +slwi r12,2 +lwax r8,r10,r8 +add r10,r8,r12 +mtctr r10 +bctr +.Ltab: +.word (.Lcase0-.Ltab) +.word (.Lcase1-.Ltab) +.word (.Ldefault-.Ltab) +.word (.Lcase3-.Ltab) @@ -7506,10 +8145,20 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> implementations.
PIC Code that Avoids the lwa Instruction - .text f1: addis r9,r2,.Ltab@ha sldi r10,r3,2 addi - r9,r9,.Ltab@l lwzx r10,r10,r9 sub r10,r2,r10 mtctr r10 bctr .Ltab: - .long .TOC. - Lcase0 .long .TOC. - Lcase1 .long .TOC. - Ldefault .long - .TOC. - Lcase13 + .text +f1: + addis r9,r2,.Ltab@ha + sldi r10,r3,2 + addi r9,r9,.Ltab@l + lwzx r10,r10,r9 + sub r10,r2,r10 + mtctr r10 + bctr +.Ltab: + .long .TOC. - Lcase0 + .long .TOC. - Lcase1 + .long .TOC. - Ldefault + .long .TOC. - Lcase13
@@ -7553,7 +8202,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> shows the organization of a stack frame before a dynamic allocation. - +
Before Dynamic Stack Allocation @@ -7565,12 +8214,14 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
Example Code to Allocate n Bytes - #define n 13 ; char *a = alloca(n); ; rnd(x) = round x - to be multiple of stack alignment ; psave = size of parameter save area - (may be zero). p = 32 + rnd(sizeof(psave)+15); Offset to the start of - the dynamic allocation ld r0,0(r1) ; Load stdu r0,-rnd(n+15)(r1) ; - Store new back chain, quadword-aligned. addi r3,r1,p ; R3 = new data - area following parameter save area. + #define n 13 +; char *a = alloca(n); +; rnd(x) = round x to be multiple of stack alignment +; psave = size of parameter save area (may be zero). +p = 32 + rnd(sizeof(psave)+15); Offset to the start of the dynamic allocation +ld r0,0(r1) ; Load +stdu r0,-rnd(n+15)(r1) ; Store new back chain, quadword-aligned. +addi r3,r1,p ; R3 = new data area following parameter save area.
Because it is allowed (and common) to return without first deallocating this dynamically allocated memory, all the linkage @@ -7619,7 +8270,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> DWARF. DWARF register numbers 32 - 63 and 77 - 108 are also used to indicate the location of variables in VSX registers vsr0 - vsr31 and vsr32 - vsr63, respectively, in DWARF debug information. - +
Mappings of Common Registers @@ -7901,7 +8552,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> DWARF for the OpenPOWER ABI defines the address class codes described in . - +
Address Class Codes @@ -7955,6 +8606,6 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> in the Itanium C++ ABI, the normative text on the issue. For information about how to locate this material, see . - + diff --git a/specification/ch_3.xml b/specification/ch_3.xml index 6a85e8b..0745a03 100644 --- a/specification/ch_3.xml +++ b/specification/ch_3.xml @@ -10,11 +10,9 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> ELF header identification array, e_ident[EI_DATA], holds the value 2, defined as data encoding ELFDATA2MSB. For a little-endian encoded ELF file, it holds the value 1, defined as data encoding ELFDATA2LSB. - - e_ident[EI_CLASS] ELFCLASS64 For all 64-bit implementations. - e_ident[EI_DATA] ELFDATA2MSB For all big-endian implementations. - e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. - + e_ident[EI_CLASS] ELFCLASS64 For all 64-bit implementations. +e_ident[EI_DATA] ELFDATA2MSB For all big-endian implementations. +e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. The ELF header's e_flags member holds bit flags associated with the file. The 64-bit PowerPC processor family defines the following flags. @@ -57,9 +55,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> The ABI version to be used for the ELF header file is specified with the .abiversion pseudo-op: - - .abiversion 2 - + .abiversion 2 Processor identification resides in the ELF header's e_machine member, and must have the value EM_PPC64, defined as the value 21. @@ -253,18 +249,16 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> The TOC may straddle the boundary between initialized and uninitialized data in the data segment. The common order of sections in the data segment, some of which may be empty, follows: - - .rodata - .data - .data1 - .got - .toc - .sdata - .sbss - .plt - .bss1 - .bss - + .rodata +.data +.data1 +.got +.toc +.sdata +.sbss +.plt +.bss1 +.bss The medium code model is expected to provide a sufficiently large TOC to provide all data addressing needs of a module with a single TOC. Compilers may generate two-instruction medium code model references @@ -275,6 +269,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> instruction of the two instruction form with a nop and rewriting the second instruction. Consequently, the TOC pointer must be live during the first and second instruction of a two-instruction reference.) +   Modules Containing Multiple TOCs The link editor may create multiple TOCs. In such a case, the constituent .got, .toc, .sdata, and .sbss sections are conceptually @@ -436,16 +431,14 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> The local-entry-point handling field of st_other is generated with the .localentry pseudo op: - - .globl my_func - .type my_func, @function - my_func: - addis r2, r12, my_sym@ha(.TOC.-my_func) - addi r2, r2, my_sym@l(.TOC.-my_func) - .localentry my_func, .-my_func - ... ; function definition - blr - + .globl my_func + .type my_func, @function +my_func: + addis r2, r12, my_sym@ha(.TOC.-my_func) + addi r2, r2, my_sym@l(.TOC.-my_func) + .localentry my_func, .-my_func + ... ; function definition + blr Functions called via symbols with an st_other value of 0 may be called without a valid TOC pointer in r2. Symbols of functions that require a local entry with a valid TOC pointer should generate a symbol @@ -2433,13 +2426,10 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> - - - Relocation values 8, 9, 12, 13, 18, 23, 32, + Note:Relocation values 8, 9, 12, 13, 18, 23, 32, and 247 are not used. This is to maintain a correspondence to the relocation values used by the - 32-bit PowerPC ELF ABI. - + 32-bit PowerPC ELF ABI. @@ -4201,10 +4191,8 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> stored is given by the assembly syntax symbol@got. The value of the symbol alone is the address of the variable named symbol.For example: - - addis r3, r2,x@got@ha - ld r3,x@got@l(r3) - + addis r3, r2,x@got@ha +ld r3,x@got@l(r3)Although the Power ISA only defines 16-bit displacements, many TOCs (and hence a GOT) are larger then 64 KB but fit within 2 GB, which can be addressed with 32-bit offsets from r2. Therefore, this ABI defines a @@ -4239,18 +4227,14 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> current object file.The following code might appear in a PIC code setup sequence to compute the distance from a function entry point to the TOC base: - - addis 2,12,.TOC.-func@ha - addi 2,2,.TOC.-func@l - + addis 2,12,.TOC.-func@ha +addi 2,2,.TOC.-func@lThe syntax SYMBOL@localentry refers to the value of the local entry point associated with a function symbol. It can be used to initialize a memory word with the address of the local entry point as follows: - - .quad func@localentry - + .quad func@localentry
@@ -4282,15 +4266,11 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> may optimize TOC reference code that consists of two instructions with equivalent code when offset@ha is 0. TOC reference code: - - addis rt, r2, offset@ha - lwz rt, offset@l(rt) - + addis rt, r2, offset@ha +lwz rt, offset@l(rt) Equivalent code: - - NOP - lwz rt, offset(r2) - + NOP +lwz rt, offset(r2) Compilers and programmers must ensure that r2 is live at the actual data access point associated with extended displacement addressing. @@ -4308,32 +4288,30 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> disabling linker optimization. However, this behavior in support of non-ABI-compliant code is not guaranteed to be portable and supported in all systems. +   Compliant example - - addis r4, r2, mysym@toc@ha - b target - - - ... - - - addis r4, r2, mysym@toc@ha - target: - addi r4, r4, mysym@toc@l - ... - + addis r4, r2, mysym@toc@ha + b target + + + ... + + + addis r4, r2, mysym@toc@ha +target: + addi r4, r4, mysym@toc@l + ... +   Non-compliant example - - li r4, 0 ; #d1 - b target - - ... - - addis r4, r2, mysym@toc@ha ; #d2 - target: - addi r4, r4, mysym@toc@l ; incompatible definitions #d1 and #d2 reach this - ... - + li r4, 0 ; #d1 + b target + + ... + + addis r4, r2, mysym@toc@ha ; #d2 +target: + addi r4, r4, mysym@toc@l ; incompatible definitions #d1 and #d2 reach this + ...
Table Jump Sequences @@ -4349,13 +4327,11 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> first an addis followed by a second instruction using a D form instruction to create or load from a 32-bit offset from a register to enable hardware fusion whenever possible: - - addis r4, r3, upper - <lbz,lhz,lwz,ld> r4, lower(r4) - - addis r4, r3, upper - addi r4, r4, lower - + addis r4, r3, upper +<lbz,lhz,lwz,ld> r4, lower(r4) + +addis r4, r3, upper +addi r4, r4, lower It is encouraged that assemblers provide pseudo-ops to facilitate such code generation with a single assembler mnemonic.
@@ -4408,59 +4384,53 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> computed as negative offsets from the TCB address. The fields must never be rearranged for any reason.The current glibc extended TCB is: - - typedef struct { - /* Reservation for HWCAP data. */ - unsigned int hwcap2; - unsigned int hwcap; /* not used in LE ABI */ - - /* Indicate if HTM capable (ISA 2.07). */ - int tm_capable; - int tm_pad; - - /* Reservation for dynamic system optimizer ABI. */ - uintptr_t dso_slot2; - uintptr_t dso_slot1; - - /* Reservation for tar register (ISA 2.07). */ - uintptr_t tar_save; - - /* GCC split stack support. */ - void *__private_ss; - - /* Reservation for the event-based branching ABI. */ - uintptr_t ebb_handler; - uintptr_t ebb_ctx_pointer; - uintptr_t ebb_reserved1; - uintptr_t ebb_reserved2; - uintptr_t pointer_guard; - - /* Reservation for stack guard */ - uintptr_t stack_guard; - - /* DTV pointer */ - dtv_t *dtv; - } tcbhead_t; - + typedef struct { + /* Reservation for HWCAP data. */ + unsigned int hwcap2; + unsigned int hwcap; /* not used in LE ABI */ + + /* Indicate if HTM capable (ISA 2.07). */ + int tm_capable; + int tm_pad; + + /* Reservation for dynamic system optimizer ABI. */ + uintptr_t dso_slot2; + uintptr_t dso_slot1; + + /* Reservation for tar register (ISA 2.07). */ + uintptr_t tar_save; + + /* GCC split stack support. */ + void *__private_ss; + + /* Reservation for the event-based branching ABI. */ + uintptr_t ebb_handler; + uintptr_t ebb_ctx_pointer; + uintptr_t ebb_reserved1; + uintptr_t ebb_reserved2; + uintptr_t pointer_guard; + + /* Reservation for stack guard */ + uintptr_t stack_guard; + + /* DTV pointer */ + dtv_t *dtv; + } tcbhead_t;Modules that will not be unloaded will be present at startup time; the TLS blocks for these are created consecutively and immediately follow the TCB. The offset of the TLS block of an initially available module from the TCB remains fixed after program start.The tlsoffset(m) values for a module with index m, where m ranges 1 - M, M being the total number of modules, are computed as follows: - - tlsoffset(1) = round(16, align(1)) - tlsoffset(m + 1) = round(tlsoffset(m) + tlssize(m), align(m + 1)) - + tlsoffset(1) = round(16, align(1)) +tlsoffset(m + 1) = round(tlsoffset(m) + tlssize(m), align(m + 1)) The function round() returns its first argument rounded up to the next multiple of its second argument: - - round(x, y) = y × ceiling(x / y) - + round(x, y) = y × ceiling(x / y) The function ceiling() returns the smallest integer greater @@ -4468,24 +4438,20 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> 1 < x ≤ n: - - ceiling(x) = n - + ceiling(x) = nIn the case of dynamic shared objects (DSO), TLS blocks are allocated on an as-needed basis, with the details of allocation abstracted away by the __tls_get_addr() function, which is used to retrieve the address of any TLS variable.The prototype for the __tls_get_addr() function, is defined as follows. - - typedef struct - { - unsigned long int ti_module; - unsigned long int ti_offset; - } tls_index; - - extern void *__tls_get_addr (tls_index *ti); - + typedef struct +{ + unsigned long int ti_module; + unsigned long int ti_offset; +} tls_index; + +extern void *__tls_get_addr (tls_index *ti);The thread pointer (TP) is held in r13 and is used to access the TCB. The TP is initialized to point 0x7000 bytes past the end of the TCB. The TP offset allows for efficient addressing of the TCB and up to 4 KB - @@ -4551,10 +4517,8 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> thread-local variable x, the __tls_get_addr() function is called with one parameter. That parameter is a pointer to a data object of type tls_index. - - extern __thread unsigned int x; - &x; - + extern __thread unsigned int x; +&x;
General Dynamic Initial Relocations @@ -4702,14 +4666,12 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> sequences may be used, depending on the size of the thread storage block offset to the variable. For the following code sequence, a different relocation sequence is used for each variable. - - static __thread unsigned int x1; - static __thread unsigned int x2; - static __thread unsigned int x3; - &x1; - &x2; - &x3; - + static __thread unsigned int x1; +static __thread unsigned int x2; +static __thread unsigned int x3; +&x1; +&x2; +&x3;
Local Dynamic Initial Relocations @@ -5100,10 +5062,8 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> Given the following code fragment, the relocation sequence in is used for the Initial Exec TLS Model: - - extern __thread unsigned int x; - &x; - + extern __thread unsigned int x; +&x;
Initial Exec Initial Relocations @@ -5232,10 +5192,8 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> less than 2 GB + 28 KB relative to the end of the TCB. The third sequence is identical to the Initial Exec sequence shown in . - - static __thread unsigned int x; - &x; - + static __thread unsigned int x; +&x; illustrates which sequence is used. @@ -5765,12 +5723,10 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> , a linker may reschedule the sequence to exploit fusion by generating a sequence that may be fused by Power processors: - - nop - addis r3, r13, x@tprel@ha - addi r3, r3, x@tprel@l - nop - + nop +addis r3, r13, x@tprel@ha +addi r3, r3, x@tprel@l +nop
@@ -6752,6 +6708,7 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> For more information, see . For TLS relocations, see . +  TLS Relocation DescriptionsThe following marker relocations tie together instructions in TLS code sequences. They allow the link editor to reliably optimize TLS code. diff --git a/specification/ch_4.xml b/specification/ch_4.xml index 59fd497..0e8cf19 100644 --- a/specification/ch_4.xml +++ b/specification/ch_4.xml @@ -341,10 +341,8 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> argument passing. For example, a C program might typically issue the following declaration to begin executing at the local entry point of a function named main: - - extern int main (int argc, char *argv[], char *envp[], void *auxv[]); - int main(int argc, char *argv[], char *envp[], ElfW(auxv_t) *auxvec) - + extern int main (int argc, char *argv[], char *envp[], void *auxv[]); +int main(int argc, char *argv[], char *envp[], ElfW(auxv_t) *auxvec)where: @@ -546,55 +544,53 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> application program to another. However, the auxiliary vector conveys information from the operating system to the program. This vector is an array of structures, defined as follows: - - typedef struct + typedef struct +{ + long a_type; + union { - long a_type; - union - { - long a_val; - void *a_ptr; - void (*a_fcn)(); - } a_un; - } auxv_t; - - Name Value a_un field Comment - AT_NULL 0 ignored /* End of vector */ - AT_PHDR 3 a_ptr /* Program headers for program */ - AT_PHENT 4 a_val /* Size of program header entry */ - AT_PHNUM 5 a_val /* Number of program headers */ - AT_PAGESZ 6 a_val /* System page size */ - AT_BASE 7 a_ptr /* Base address of interpreter */ - AT_FLAGS 8 a_val /* Flags */ - AT_ENTRY 9 a_ptr /* Entry point of program */ - AT_UID 11 /* Real user ID (uid) */ - AT_EUID 12 /* Effective user ID (euid) */ - AT_GID 13 /* Real group ID (gid) */ - AT_EGID 14 /* Effective group ID (egid) */ - AT_PLATFORM 15 a_ptr /* String identifying platform. */ - AT_HWCAP 16 a_val /* Machine-dependent hints about - processor capabilities. */ - AT_CLKTCK 17 /* Frequency of times(), always 100 */ - AT_DCACHEBSIZE 19 a_val /* Data cache block size */ - AT_ICACHEBSIZE 20 a_val /* Instruction cache block size */ - AT_UCACHEBSIZE 21 a_val /* Unified cache block size */ - AT_IGNOREPPC 22 /* Ignore this entry! */ - AT_SECURE 23 /* Boolean, was exec authorized to use - setuid or setgid */ - AT_BASE_PLATFORM 24 a_ptr /* String identifying real platforms */ - AT_RANDOM 25 /* Address of 16 random bytes */ - AT_HWCAP2 26 a_val /* More machine-dependent hints about - processor capabilities. */ - AT_EXECFN 31 /* File name of executable */ - AT_SYSINFO_EHDR 33 /* In many architectures, the kernel - provides a virtual dynamic shared - object (VDSO) that contains a function - callable from the user state. - AT_SYSINFO_EHDR is the address of the - VDSO header that is used by the - dynamic linker to resolve function - symbols with the VDSO. */ - + long a_val; + void *a_ptr; + void (*a_fcn)(); + } a_un; +} auxv_t; + +Name Value a_un field Comment +AT_NULL 0 ignored /* End of vector */ +AT_PHDR 3 a_ptr /* Program headers for program */ +AT_PHENT 4 a_val /* Size of program header entry */ +AT_PHNUM 5 a_val /* Number of program headers */ +AT_PAGESZ 6 a_val /* System page size */ +AT_BASE 7 a_ptr /* Base address of interpreter */ +AT_FLAGS 8 a_val /* Flags */ +AT_ENTRY 9 a_ptr /* Entry point of program */ +AT_UID 11 /* Real user ID (uid) */ +AT_EUID 12 /* Effective user ID (euid) */ +AT_GID 13 /* Real group ID (gid) */ +AT_EGID 14 /* Effective group ID (egid) */ +AT_PLATFORM 15 a_ptr /* String identifying platform. */ +AT_HWCAP 16 a_val /* Machine-dependent hints about + processor capabilities. */ +AT_CLKTCK 17 /* Frequency of times(), always 100 */ +AT_DCACHEBSIZE 19 a_val /* Data cache block size */ +AT_ICACHEBSIZE 20 a_val /* Instruction cache block size */ +AT_UCACHEBSIZE 21 a_val /* Unified cache block size */ +AT_IGNOREPPC 22 /* Ignore this entry! */ +AT_SECURE 23 /* Boolean, was exec authorized to use + setuid or setgid */ +AT_BASE_PLATFORM 24 a_ptr /* String identifying real platforms */ +AT_RANDOM 25 /* Address of 16 random bytes */ +AT_HWCAP2 26 a_val /* More machine-dependent hints about + processor capabilities. */ +AT_EXECFN 31 /* File name of executable */ +AT_SYSINFO_EHDR 33 /* In many architectures, the kernel + provides a virtual dynamic shared + object (VDSO) that contains a function + callable from the user state. + AT_SYSINFO_EHDR is the address of the + VDSO header that is used by the + dynamic linker to resolve function + symbols with the VDSO. */ AT_NULL The auxiliary vector has no fixed length; instead an entry of this type denotes the end of the vector. The corresponding value of a_un is @@ -660,44 +656,40 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> AT_HWCAP The a_val member of this entry is a bit map of hardware capabilities. Some bit mask values include: - - PPC_FEATURE_32 0x80000000 /* Always set for powerpc64 */ - PPC_FEATURE_64 0x40000000 /* Always set for powerpc64 */ - PPC_FEATURE_HAS_ALTIVEC 0x10000000 - PPC_FEATURE_HAS_FPU 0x08000000 - PPC_FEATURE_HAS_MMU 0x04000000 - PPC_FEATURE_UNIFIED_CACHE 0x01000000 - PPC_FEATURE_NO_TB 0x00100000 /* 601/403gx have no timebase */ - PPC_FEATURE_POWER4 0x00080000 /* POWER4 ISA 2.00 */ - PPC_FEATURE_POWER5 0x00040000 /* POWER5 ISA 2.02 */ - PPC_FEATURE_POWER5_PLUS 0x00020000 /* POWER5+ ISA 2.03 */ - PPC_FEATURE_CELL_BE 0x00010000 /* CELL Broadband Engine */ - PPC_FEATURE_BOOKE 0x00008000 /* ISA Category Embedded */ - PPC_FEATURE_SMT 0x00004000 /* Simultaneous Multi-Threading */ - PPC_FEATURE_ICACHE_SNOOP 0x00002000 - PPC_FEATURE_ARCH_2_05 0x00001000 /* ISA 2.05 */ - PPC_FEATURE_PA6T 0x00000800 /* PA Semi 6T Core */ - PPC_FEATURE_HAS_DFP 0x00000400 /* Decimal FP Unit */ - PPC_FEATURE_POWER6_EXT 0x00000200 /* P6 + mffgpr/mftgpr */ - PPC_FEATURE_ARCH_2_06 0x00000100 /* ISA 2.06 */ - PPC_FEATURE_HAS_VSX 0x00000080 /* P7 Vector Extension. */ - PPC_FEATURE_PSERIES_PERFMON_COMPAT 0x00000040 - PPC_FEATURE_TRUE_LE 0x00000002 - PPC_FEATURE_PPC_LE 0x00000001 - + PPC_FEATURE_32 0x80000000 /* Always set for powerpc64 */ +PPC_FEATURE_64 0x40000000 /* Always set for powerpc64 */ +PPC_FEATURE_HAS_ALTIVEC 0x10000000 +PPC_FEATURE_HAS_FPU 0x08000000 +PPC_FEATURE_HAS_MMU 0x04000000 +PPC_FEATURE_UNIFIED_CACHE 0x01000000 +PPC_FEATURE_NO_TB 0x00100000 /* 601/403gx have no timebase */ +PPC_FEATURE_POWER4 0x00080000 /* POWER4 ISA 2.00 */ +PPC_FEATURE_POWER5 0x00040000 /* POWER5 ISA 2.02 */ +PPC_FEATURE_POWER5_PLUS 0x00020000 /* POWER5+ ISA 2.03 */ +PPC_FEATURE_CELL_BE 0x00010000 /* CELL Broadband Engine */ +PPC_FEATURE_BOOKE 0x00008000 /* ISA Category Embedded */ +PPC_FEATURE_SMT 0x00004000 /* Simultaneous Multi-Threading */ +PPC_FEATURE_ICACHE_SNOOP 0x00002000 +PPC_FEATURE_ARCH_2_05 0x00001000 /* ISA 2.05 */ +PPC_FEATURE_PA6T 0x00000800 /* PA Semi 6T Core */ +PPC_FEATURE_HAS_DFP 0x00000400 /* Decimal FP Unit */ +PPC_FEATURE_POWER6_EXT 0x00000200 /* P6 + mffgpr/mftgpr */ +PPC_FEATURE_ARCH_2_06 0x00000100 /* ISA 2.06 */ +PPC_FEATURE_HAS_VSX 0x00000080 /* P7 Vector Extension. */ +PPC_FEATURE_PSERIES_PERFMON_COMPAT 0x00000040 +PPC_FEATURE_TRUE_LE 0x00000002 +PPC_FEATURE_PPC_LE 0x00000001 AT_HWCAP2 The a_val member of this entry is a bit map of hardware capabilities. Some bit mask values include: - - PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */ - PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */ - PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */ - PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */ - PPC_FEATURE2_HAS_ISEL 0x08000000 /* Integer Select */ - PPC_FEATURE2_HAS_TAR 0x04000000 /* Target Address Register */ - PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the - Vector.AES category */ - + PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */ +PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */ +PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */ +PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */ +PPC_FEATURE2_HAS_ISEL 0x08000000 /* Integer Select */ +PPC_FEATURE2_HAS_TAR 0x04000000 /* Target Address Register */ +PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the + Vector.AES category */ When a process starts to execute, its stack holds the arguments, environment, and auxiliary vector received from the exec call. The system makes no guarantees about the relative arrangement of argument strings, @@ -934,14 +926,12 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> following the call. - - tocsaveloc: - nop - ... - bl target - .reloc ., R_PPC64_TOCSAVE, tocsaveloc - nop - + tocsaveloc: + nop + ... +bl target + .reloc ., R_PPC64_TOCSAVE, tocsaveloc + nop 3. The caller has not set up r2 to hold the TOC pointer. This @@ -965,45 +955,37 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> A possible implementation for case 1 looks as follows (if func@plt@toc is less than 32 KB, the call stub may be simplified to omit the addis): - - std r2,24(r1) - addis r12,r2,func@plt@toc@ha - ld r12,func@plt@toc@l(r12) - mtctr r12 - bctr - + std r2,24(r1) +addis r12,r2,func@plt@toc@ha +ld r12,func@plt@toc@l(r12) +mtctr r12 +bctr For case 2, the same implementation as for case 1 may be used, except that the first instruction “std r2,24(r1)” is omitted: - - addis r12,r2,func@plt@toc@ha - ld r12,func@plt@toc@l(r12) - mtctr r12 - bctr - + addis r12,r2,func@plt@toc@ha +ld r12,func@plt@toc@l(r12) +mtctr r12 +bctr A possible implementation for case 3 looks as follows: - - mflr r0 - bcl 20,31,1f - 1: mflr r2 - mtlr r0 - addis r2,r2,(.TOC.-1b)@ha - addi r2,r2,(.TOC.-1b)@l - addis r12,r2,func@plt@toc@ha - ld r12,func@plt@toc@l(r12) - mtctr r12 - bctr - + mflr r0 + bcl 20,31,1f +1: mflr r2 + mtlr r0 + addis r2,r2,(.TOC.-1b)@ha + addi r2,r2,(.TOC.-1b)@l + addis r12,r2,func@plt@toc@ha + ld r12,func@plt@toc@l(r12) + mtctr r12 + bctr When generating non-PIC code for the small or medium code model, a simpler variant may alternatively be used for cases 2 or 3: - - lis r12,func@plt@ha - ld r12,func@plt@l(r12) - mtctr r12 - bctr - + lis r12,func@plt@ha +ld r12,func@plt@l(r12) +mtctr r12 +bctr To support lazy binding, the link editor also provides a set of symbol resolver stubs, one for each PLT entry. Each resolver stub consists of a single instruction, which is usually a branch to a common @@ -1029,52 +1011,50 @@ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> Beyond the above requirements, the implementation of the .glink resolver stubs is up to the link editor. The following shows an example implementation: - - # ABI note: At entry to the resolver stub: - # - r12 holds the address of the res_N stub for the target routine - # - all argument registers hold arguments for the target routine - PLTresolve: - # Determine addressability. This sequence works for both PIC - # and non-PIC code and does not rely on presence of the TOC pointer. - mflr r0 - bcl 20,31,1f - 1: mflr r11 - mtlr r0 - # Compute .plt section index from entry point address in r12 - # .plt section index is placed into r0 as argument to the resolver - sub r0,r12,r11 - subi r0,r0,res_0-1b - srdi r0,r0,2 - # Load address of the first byte of the PLT - ld r12,PLToffset-1b(r11) - add r11,r12,r11 - # Load resolver address and DSO identifier from the - # first two doublewords of the PLT - ld r12,0(r11) - ld r11,8(r11) - # Branch to resolver - mtctr r12 - bctr - # ABI note: At entry to the resolver: - # - r12 holds the resolver address - # - r11 holds the DSO identifier - # - r0 holds the PLT index of the target routine - # - all argument registers hold arguments for the target routine - - # Constant pool holding offset to the PLT - # Note that there is no actual symbol PLT; the link editor - # synthesizes this value when creating the .glink section - PLToffset: - .quad PLT-. - - # A table of branches, one for each PLT entry - # The idea is that the PLT call stub loads r12 with these - # addresses, so (r12 - res_0) gives the PLT index × 4. - - res_0: b PLTresolve - res_1: b PLTresolve - ... - + # ABI note: At entry to the resolver stub: + # - r12 holds the address of the res_N stub for the target routine + # - all argument registers hold arguments for the target routine +PLTresolve: + # Determine addressability. This sequence works for both PIC + # and non-PIC code and does not rely on presence of the TOC pointer. + mflr r0 + bcl 20,31,1f +1: mflr r11 + mtlr r0 + # Compute .plt section index from entry point address in r12 + # .plt section index is placed into r0 as argument to the resolver + sub r0,r12,r11 + subi r0,r0,res_0-1b + srdi r0,r0,2 + # Load address of the first byte of the PLT + ld r12,PLToffset-1b(r11) + add r11,r12,r11 + # Load resolver address and DSO identifier from the + # first two doublewords of the PLT + ld r12,0(r11) + ld r11,8(r11) + # Branch to resolver + mtctr r12 + bctr + # ABI note: At entry to the resolver: + # - r12 holds the resolver address + # - r11 holds the DSO identifier + # - r0 holds the PLT index of the target routine + # - all argument registers hold arguments for the target routine + + # Constant pool holding offset to the PLT + # Note that there is no actual symbol PLT; the link editor + # synthesizes this value when creating the .glink section +PLToffset: + .quad PLT-. + + # A table of branches, one for each PLT entry + # The idea is that the PLT call stub loads r12 with these + # addresses, so (r12 - res_0) gives the PLT index × 4. + +res_0: b PLTresolve +res_1: b PLTresolve + ... After resolution, the value of a PLT entry in the PLT is the address of the function’s global entry point, unless the resolver can determine that a module-local call occurs with a shared TOC value wherein diff --git a/specification/ch_5.xml b/specification/ch_5.xml index ffa6321..218dcd6 100644 --- a/specification/ch_5.xml +++ b/specification/ch_5.xml @@ -45,9 +45,7 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> by the linker as necessary to resolve symbols.
- Types Defined in the Standard Header - <anchor xml:id="dbdoclet.50655243___RefHeading___Toc377640670" - xreflabel="" /> Types Defined in the Standard Header + Types Defined in the Standard Header The type va_list shall be defined as follows: typedef void * va_list; The following integer types are defined in headers, which must be @@ -55,13 +53,13 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> such headers. They shall have the following definitions: - typedef long ptrdiff_t; + typedef long long ptrdiff_t; - typedef unsigned long size_t; + typedef unsigned longint size_t; - typedef int wchar_t; + typedef intlong wchar_t; typedef int sig_atomic_t; @@ -79,7 +77,7 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> typedef int int32_t; - typedef long int64_t; + typedef long long int64_t; typedef unsigned char uint8_t; @@ -91,7 +89,7 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> typedef unsigned int uint32_t; - typedef unsigned long uint64_t; + typedef unsigned long long uint64_t; typedef signed char int_least8_t; @@ -103,7 +101,7 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> typedef int int_least32_t; - typedef long int_least64_t; + typedef long long int_least64_t; typedef unsigned char uint_least8_t; @@ -115,7 +113,7 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> typedef unsigned int uint_least32_t; - typedef unsigned long uint_least64_t; + typedef unsigned long long uint_least64_t; typedef signed char int_fast8_t; @@ -127,7 +125,7 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> typedef int int_fast32_t; - typedef long int_fast64_t; + typedef long long int_fast64_t; typedef unsigned char uint_fast8_t; @@ -139,19 +137,19 @@ xml:id="dbdoclet.50655243_pgfId-1099317"> typedef unsigned int uint_fast32_t; - typedef unsigned long uint_fast64_t; + typedef unsigned long long uint_fast64_t; - typedef long intptr_t; + typedef long long intptr_t; - typedef unsigned long uintptr_t; + typedef unsigned long long uintptr_t; - typedef long intmax_t; + typedef long long intmax_t; - typedef unsigned long uintmax_t; + typedef unsigned long long uintmax_t;
diff --git a/specification/ch_6.xml b/specification/ch_6.xml index aebe77b..e34f665 100644 --- a/specification/ch_6.xml +++ b/specification/ch_6.xml @@ -65,27 +65,23 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> access the n-th vector element from a vector pointer. The use of vector built-in functions such as vec_xl and vec_xst is discouraged except for languages where no dereference operators are available.
- - vector char vca; - vector char vcb; - vector int via; - int a[4]; - void *vp; - - via = *(vector int *) &a[0]; - vca = (vector char) via; - vcb = vca; - vca = *(vector char *)vp; - *(vector char *)&a[0] = vca; - + vector char vca; +vector char vcb; +vector int via; +int a[4]; +void *vp; + +via = *(vector int *) &a[0]; +vca = (vector char) via; +vcb = vca; +vca = *(vector char *)vp; +*(vector char *)&a[0] = vca; Compilers are expected to recognize and optimize multiple operations that can be optimized into a single hardware instruction. For example, a load and splat hardware instruction might be generated for the following sequence: - - double *double_ptr; - register vector double vd = vec_splats(*double_ptr); - + double *double_ptr; +register vector double vd = vec_splats(*double_ptr);
Vector Operators @@ -231,7 +227,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944">
- + vec_bperm @@ -243,7 +239,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> the result. - + vec_cntlz_lsbb @@ -254,7 +250,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> For LE, use vctzlsbb. - + vec_cnttz_lsbb @@ -276,7 +272,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> vec_extract (v, 3) is equivalent to v[3]. - + vec_extract_fp32_ from_shorth @@ -288,7 +284,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> For LE, extract the left four elements. - + vec_extract_fp32_ from_shortl @@ -300,7 +296,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> For LE, extract the right four elements. - + vec_extract4b @@ -312,7 +308,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> halves of the result. - + vec_first_match _index @@ -324,7 +320,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> For LE, use vctz. - + vec_first_match _index_or_eos @@ -348,7 +344,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> third element modified to contain x. - + vec_insert4b @@ -546,7 +542,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> Use vupkhsb, and so on, for LE. - + vec_xl_len_r @@ -559,7 +555,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> number of bytes specified to be loaded by vec_xl_len_r. - + vec_xst_len_r @@ -582,6 +578,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> another vector data type in accordance with the C and C++ programming languages. +   Extended Data Movement Functions The built-in functions in map to Altivec/VMX load and @@ -1176,7 +1173,7 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> - vec_xlw4 + vec_xlw4 Deprecated. The use of vector data type assignment and overloaded vec_xl and vec_xst vector @@ -1298,54 +1295,6 @@ xml:id="dbdoclet.50655244_pgfId-1095944"> - - - VEC_CONCAT (ARG1, ARG2) - (Fortran) - POWER ISA 3.0 - - - Purpose: - Concatenates two elements to form a vector. - Result value: - The resulting vector consists of the two scalar elements, - ARG1 and ARG2, assigned to elements 0 and 1 (using the - environment’s native endian numbering), respectively. - - - Note: This function corresponds to the C/C++ vector - constructor (vector type){a,b}. It is provided only for - languages without vector constructors. - - - - - - - POWER ISA 3.0 - - - vector signed long long vec_concat (signed long long, - signed long long); - - - - - POWER ISA 3.0 - - - vector unsigned long long vec_concat (unsigned long long, - unsigned long long); - - - - - POWER ISA 3.0 - - - vector double vec_concat (double, double); - - VEC_CONVERT(V, MOLD) diff --git a/specification/ch_preface.xml b/specification/ch_preface.xml index a89d04d..1d7434b 100644 --- a/specification/ch_preface.xml +++ b/specification/ch_preface.xml @@ -19,7 +19,55 @@ xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" - xml:id="ch_PAPR_preface"> + xml:id="ch_ABI_preface"> + About this Document + + This specification defines the OpenPOWER ELF V2 application binary interface (ABI). + This ABI is derived from and represents the first major update to the Power ABI since the + original release of the IBM® RS/6000® ABI. It was developed to make extensive use of new + functions available in OpenPOWER-compliant processors. It expects an OpenPOWER-compliant + processor to implement at least Power ISA V2.07B with all OpenPOWER Architecture instruction + categories as well as OpenPOWER-defined implementation characteristics for some + implementation-specific features. + + + Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-compliant + processors must implement the following categories: + + Base + + 64-Bit + + Server (subject to system-level requirements) + + Floating-Point + + Floating-Point.Record + + Load/Store Quadword x2 + + Store Conditional Page Mobility (subject to system-level requirements) + + Stream + + Transactional Memory + + Vector + + Vector.Little-Endian + + Vector-Scalar + + + + For more information about these categories, see “Categories” in Book I of + Power ISA, version 2.07B. + + + The OpenPOWER ELF V2 ABI is intended for use in little- and big-endian environments. + + +
Notices @@ -61,6 +109,10 @@ www.freescale.com/files/abstract/help_page/TERMSOFUSE.html. + Itanium, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered + trademarks of Intel Corporation or its subsidiaries in the United States and other countries. + + Linux is a trademark of Linus Torvalds in the United States, other countries, or both. @@ -70,5 +122,6 @@ Other company, product, and service names may be trademarks or service marks of others. - +
+ diff --git a/specification/pom.xml b/specification/pom.xml index 5fa768f..87066c0 100644 --- a/specification/pom.xml +++ b/specification/pom.xml @@ -53,10 +53,10 @@ article/appendix nop article toc,title book toc,title,figure,table,example,equation - book/appendix nop + book/appendix nop book/chapter nop chapter toc,title - chapter/section nop + chapter/section nop section toc part toc,title qandadiv toc