microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	09b340e845	FPU: Update committed FPSCR value correctly The committed FPSCR is updated in the cycle where an FPU instruction signals completion. Since we update the FPRF field in the FPSCR in that same cycle, the value put into r.comm_fpscr needs to include the new FPRF value. Otherwise, a subsequent flush (for example, due to the following instruction being an illegal instruction that has to be emulated) will drop the FPSCR update. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	1ad8848655	FPU: Improve zero result detection and simplify final states This improves detection of results that are exactly zero in FINISH state by noting that on entry to FINISH state, if R is zero then X must also be zero, so no rounding needs to be done and no underflow exists. Therefore we can set rcls_op = RCLS_TZERO to test for zero and exit early if R = 0. The RCLS_TZERO test now tests the whole of R just in case. The rest of the following states have been streamlined and simplified. In cases of underflow, we only need to take action before rounding in the UE=0 case (disabled underflow exception), where we need to denormalize before rounding. For enabled underflow cases we just use the existing NORMALIZE state, which lets us remove NORM_UFLOW state. On entry to ROUNDING state, R can be zero or denorm only for round to integer instructions (fri) or for disabled underflow exception cases. Note that in case of underflow with UE=0, the exception is only actually signalled if there is loss of accuracy, i.e. if FPSCR[FI] will be set. This is now done at the end of ROUNDING state. For underflow with UE=1, we go to a new ROUND_UFLOW_EN state to adjust the exponent from ROUNDING, ROUNDING_2 or ROUNDING_3 state. In the ROUNDING states, we avoid shifting left to normalize a result with exponent <= -1022, because if we did we would then just need to denormalize again. This lets us get rid of DENORM state. Finally, noticing that DO_FRSP_2 state does much the same as FINISH state lets us remove DO_FRSP_2 state and go to FINISH state from DO_FRSP. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	f8a11420ca	FPU: Check for rounding overflow in 32-bit convert-to-integer operations Without this, rounding a value of 0xFFFFFFFF up, giving 0x100000000, will yield an incorrect result of zero. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	6fe4b549f5	FPU: Improve accuracy in multiply-add almost-cancellation cases There are two paths for multiply-add instructions; one where the product is larger or nearly the same as the addend, which does the addition/subtraction in the multiplier with 128-bit accuracy; the other is used when the addend is clearly larger, which shifts the product right before doing the addition/subtraction in 64-bit arithmetic. The threshold for the second path is that B_exp has to be greater than A_exp + C_exp + 1, the +1 being because the product mantissa can be greater than 2. This increases the +1 to +2 to make sure that the 128-bit path is used when there is any chance of cancellation of the high-order bits of the sum. With the +1 threshold we could still get close to cancellation when the mantissas of A and C were nearly 2 and the mantissa of B was 1. This improves accuracy and avoids the need to do a 120-bit subtraction in the second path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	80c81b58ef	FPU: Generate correct result sign when B is denormal If a subtraction A - B is done where A is in normalized form with an exponent of -1022, and B is denormal, an inconsistency arises between the comparison of the raw exponents in the first cycle, which sees A.exp (0x001) > B.exp (0x000), and the comparison in DO_FADD state, which sees r.a.exponent (-1022) = r.b.exponent (-1022). Conseqently we get r.add_bsmall = 0 and the subtraction is done the wrong way around, yielding the wrong sign for the result. Fix this by setting r.add_bsmall according to the comparison of raw exponents in the first cycle and then using it in DO_FADD state. Also add a test case for this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	f631dcd700	FPU: Set FPRF correctly on multiply result that underflows rcls_op being set to RCLS_TZERO was not detecting a zero result after rounding for a multiply result that underflows, because S still had low bits of the product. To fix this, remove the 's_nz = 0' from the RCLS_TZERO test. We can't then use this test in the FMADD_6 state, but we really shouldn't be testing for zero there, before rounding, so remove that. Also simplify FMADD_6 state by not setting rs_norm and going always to FINISH state rather than going to NORMALIZE state. Add a test for this case (actually a fmadd with B=0). While here, remove a pointless assignment to f_to_multiply.valid in MULT_1 state, since r.first is never set here. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	b122577a4e	FPU: Be more careful about preserving low-order bits in multiply-add instrs Add code to check whether bits of S which don't get shifted into R are non-zero, and set X if they are, so that rounding in multiply-add instructions works correctly. This needs to be done after normalization in the case of very small results, where potentially all the non-zero bits in S do get shifted into R. Also fix an incorrect test case, and add another multiply-add test case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	59992eab90	FPU: Avoid doing overflow processing twice in OE=1 case Split the ROUND_OFLOW state into two, one which handles the OE=0 case (disabled overflow exception) and one which handles the OE=1 case (enabled overflow exception). This avoids a loop in the state diagram and prevents us from adding the exponent bias twice. Also correct a bug in ROUNDING_3 state where for single-precision operations which yield a result which is denormal in double-precision format, r.shift was set wrongly. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	9f27f60b26	FPU: Clear FPSCR[FR,FI] on overflow in convert-to-integer instructions Also simplify INT_CHECK state by going to INT_OFLOW on overflow. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	37edba4da7	FPU: Normalize B operand for multiply-add instructions Otherwise the result can get rounded incorrectly when B is denorm but the A * C product is much smaller. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	d33f31509b	FPU: Clear S in ADD_SHIFT state Otherwise, if this is a multiply-add instruction and the result needs to be shifted left, bits of the product in S will contaminate the final result. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	b8f7cbd894	FPU: Record bits shifted out of addend in fmadd-family instructions If the addend is smaller than the product and thus needs to be shifted right, record if any bits are lost from the right end in r.x, so that the result gets rounded correctly. Also add a test that checks one such case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	009ee1c9c5	FPU: Renormalize frsp operand if denormalized This arranges for the frsp operand to be renormalized if necessary. Without this, we can incorrectly get X set to 1 for denormalized operands, and hence the rounding may be done incorrectly. To make things clearer, we now have an explicit flag indicating when the B operand needs to be in normalized form. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	baf8f5f8c6	FPU: Force reserved FPSCR bit 11 to zero This ensures that the reserved FPSCR bit can never be set, by clearing it at the end of the fpu_1 process. Also remove a redundant setting of cr_result in the mcrfs code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	a18c462b27	FPU: Ignore stale P contents in short-circuit multiply-add When a multiply-add is done with A or C equal to zero, the actual multiplication operation is not done, hence P is not valid, so in FINISH state we shouldn't set X based on P being non-zero. Fix this by clearing the is_multiply flag in the short-circuit case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	41988e3b5f	FPU: Fix comparison of remainder in square root code The square root procedure needs to compare B - R^2 with 2R + 1 to decide whether to increment the square root estimate R by 1. It currently does this by putting 2R + 1 in B and using the pcmpb_lt and pcmpb_eq signals. This is not correct because the comparisons that generate those signals have a 2-bit shift embedded into them. Instead, put 2R + 1 into C and use pcmpc_lt/eq, which don't have the 2-bit shift. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	f3b9566ae2	FPU: Round to single precision for fcfid[u]s The fcfids and fcfidus instructions weren't rounding to single precision because r.longmask wasn't getting set. To fix this, set v.longmask to e_in.single for the fcfid* instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	e5651e2eab	FPU: Avoid adding bias twice in UE=1 underflow case In case of underflow with UE=1, ROUND_UFLOW state adds the exponent bias and then goes to NORMALIZE state if the value is not normalized. Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is still tiny, resulting in the bias getting added twice. To avoid this, if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW state which does the normalization and goes to ROUNDING state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	a0755935f4	FPU: Normalize B for fmadd family instructions If B is denormalized, but the A*C product is much smaller, then the result is B; in the UE=1 case we need to normalize the result, and the left shift to do that can bring in low-order product bits from S and corrupt the result. To avoid this, make sure B is normalized. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	32919435a3	FPU: Allow mtfsb* to set FPSCR[FX] implicitly If mtfsb1 causes an individual exception bit to go from 0 to 1, that should set FX as well. Arrange for this by setting update_fx to 1. Also make sure mcrfs doesn't copy the reserved FPSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	e471581222	FPU: Do result processing on denorm short-circuit results when FPSCR[UE] is set Results that are tiny (i.e., in the denorm range) need special processing when underflow exceptions are enabled, including in the cases where the result is just one of the input operands, such as for a fmadd with A or C equal to zero. To make sure this gets done, go to FINISH state rather than returning the relevant input operand as the result. The same logic is now used when the result needs to be rounded to single precision. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	0478fe41dd	FPU: Reset FPSCR[FR,FI] at beginning of fcfid* Otherwise a non-zero setting from a previous instruction won't get cleared. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	f252dba43d	FPU: Only apply zero subtraction result sign rule when result is exactly zero The rule in the ISA about the sign of the result of a subtraction when the magnitude of the result is zero only applies when the operands are equal in magnitude but opposite in sign, i.e. when the result is exactly zero. Add a check using FPSCR[FI] to exclude the cases where the exact result is non-zero but gets truncated to zero by rounding. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	8a204f1058	FPU: Set FPSCR exception summary based on individual invalid exception bits Rather than setting FPSCR[FX] to 1 when FPSCR[VX] transitions from 0 to 1, this sets it when any of the individual invalid exception bits (VSXNAN, VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) transitions from 0 to 1. This better matches the ISA and P9 behaviour. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	fb71f62b83	FPU: Round finite special-case results to single precision if required When a special case is detected, such as a zero operand to an add, and the operation is a single-precision operation such as fadds, we need to round the result to single precision instead of just returning the relevant input operand unmodified. This accomplishes that by going to DO_FRSP_2 state from the special-case code for single-precision operations that return a finite floating-point result. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	de71a6119c	FPU: Make FPSCR bit 11 always read as 0 Bit 11 (52 in BE numbering) is a reserved bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	ca792f3b13	FPU: Make convert-to-integer-word instructions behave like P9 The fctiw* instructions return a copy of the value in bits 31..0 in bits 63..32 of the result on P9, rather than a sign or zero extension of the word result. Make the FPU do the same. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	82825a11ba	FPU: Set result sign correctly for denorm +/- 0 case Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	37b1afc7f7	FPU: Make fri* instructions set FPSCR[FR,FI] to zero As required by the ISA. Also, never generate an inexact exception. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	dcd85164c6	FPU: Make fsel not alter FPSCR fsel is a move-type instruction, and hence shouldn't affect FPSCR. Set v.writing_fpr and v.instr_done, rather than setting arith_done, to achieve this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	066e38b8ea	FPU: Do proper over/underflow handling for single-precision [fm]add The ADD_3 state incorporated some of the logic of the FINISH state, but in some cases assumed the result couldn't overflow or underflow - which is not true for single precision operations, if the input operands are outside the single precision range. Fix this, and simplify things, by having ADD_3 always go to FINISH state, which does the full overflow and underflow checking. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	d540171f60	FPU: Ignore Rc bit for mffs* variants other than plain mffs Bit 0 of the instruction is Rc for mffs but reserved for the other mffs* instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	0e11f80f2f	FPU: Set FPSCR[FPRF] to zero for convert to integer operations This seems to be what P9 does. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	2f29daab2d	FPU: Fix setting of r.x for single-precision operations The fp_rounding function expects r.x to have been set based on the lower 31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1 (SP_RBIT is 31). Also add a test to check. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	ab3783b61b	FPU: Fix setting of r.x Having computed rormr, use it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	e60840eabc	FPU: Make sure FR and FI in FPSCR get reset on special-case arith instructions Arithmetic instructions where the result is determined without doing any actual computation (i.e. the input(s) are NaNs, infinities, zeroes etc.) weren't resetting FR and FI properly. This combines the two blocks that handle the r.cycle_1_ar = 1 case to fix it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 week ago
Paul Mackerras	4282d37741	FPU: Faster method for testing for 1-bits at right end of R At various points we need to set the X bit if any bit of R which would be shifted out by a right shift of N bits is a 1. We can do this by computing R \| -R, which contains a 1 in the position of the right-most 1-bit in R and in all positions to the left, and zeroes to the right. That means we can test for the least-significant N bits being non-zero by testing whether bit N-1 of (R \| -R) is a 1. Doing this uses fewer LUTs and has better timing than the old method of generating a mask, ANDing it with R, and testing whether the result is non-zero. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	3268ef717c	FPU: Make opsel_a a function of just the state This adds some extra states and transitions so that opsel_a becomes a function only of the current state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	73505b1626	FPU: Provide a separate path for transferring A/B/C to R The timing path from r.a.class to result showed up as a critical path on the Artix-7, apparently because of transfers of A, B or C to R in special cases (e.g. NaN inputs) and the fsel instruction. To alleviate this, we provide a path via the miscellaneous value multiplexer from A, B and C to R, selected via opsel_R = RES_MISC and misc_sel = 111. A new selector opsel_sel selects which of A, B or C to transfer, using the same encoding as opsel_a. This new selector is now also used for the result class when rcls_op = RCLS_SEL and for the result sign when rsgn_op = RSGN_SEL. This reduces the number of things that opsel_a depends on and eases timing in the main adder path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	b63773f6e9	FPU: Move computation of main adder inputs out of the state machine Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	b4aae8511d	FPU: Move special case handling to a separate process This creates a new fpu_specialcases process that handles most of the logic that was previously in the DO_NAN_INF and DO_ZERO_DEN states. What remains of those states, i.e. the handling of denormalized inputs, is in a new DO_SPECIAL state. The state machine goes into DO_SPECIAL state after IDLE for any arithmetic operation where an input is a NaN, infinity, zero or denormalized value. Doing this means that the rest of the state machine won't try to start any computation which would need to be overridden by the logic to produce the result value selected by the fpu_specialcases process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	b1bd2aa865	FPU: Make set_r independent of multiply_to_f.valid Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	fcfdbc449c	FPU: Move condition register calculations to an explicit data path Instead of calculating v.cr_result in the state machine, we now have the state machine set a 'cr_op' variable which then controls what computation the CR data path does to set v.cr_result. The CR data path also handles updating the XERC result bits for integer operations (division and modulus). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	bbc485f336	FPU: Rework inputs to the main adder With this, the A input no longer has R as an option but now takes the rounding constants and the low-order bits of P (used as an adjustment in the square root algorithm). The B input has either R or zero. Both inputs can be optionally inverted for subtraction. The select inputs to the multiplexers now have 3 bits in opsel_a and 1 bit in opsel_b. The states which need R to be set now explicitly have set_r := 1 even though that is the default, essentially for documentation reasons. Similarly some states set opsel_b <= BIN_R even though that is the default. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	0e7c11a0e4	FPU: Move result_class logic outside of state machine The various states choose one of four operations (including no-op) to be done on result_class. Some operations have side-effects on arith_done or FPSCR. The DO_NAN_INF and DO_ZERO_DEN states still set result_class directly since their logic is expected to move out to a separate process later. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	5f0b2d433d	FPU: Simplify calculation of result_class For the various arithmetic operators, we only get to the DO_* states when the inputs are finite (not zero, infinity or NaN), so we can replace setting of v.result_class to r.a.class or r.b.class with a overall setting of it to FINITE in cycle 1 of all those operations. Also, integer division doesn't need to set the result class since the result is integer. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	70819c4c39	FPU: Do renormalization from DO_ZERO_DEN state Instead of having the various DO_* states (DO_FMUL, DO_FDIV, etc.) handle checking for denormalized inputs, we now have DO_ZERO_DEN state check for denormalized inputs and branch to RENORM_{A,B,C} to handle them. This also meant some changes were needed in how fsqrt and frsqrte handled inputs with odd exponent. The DO_FSQRT and DO_FRSQRTE states were very similar and have been combined into one. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	8648ddb64f	FPU: Eliminate EXC_RESULT state This lets us remove r.opsel_a and is a step towards moving the handling of exceptional cases out to a separate process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	850b87c83f	FPU: Get rid of r.madd_cmp and r.exp_cmp This saves a few LUTs and simplifies the code a little. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	ba2add029a	FPU: Remove need to set opsel_a one cycle ahead Most states set opsel_a directly to select the operand for the A input of the main adder. The exception is the EXC_RESULT state, which uses r.opsel_a set by the previous cycle to indicate which input operand to use as the result. In order to make timing, ensure that the controls that select the inputs to the main adder (opsel_*, etc.) don't depend on any complicated functions of the data (such as px_nz, pcmpb_eq, pcmpb_lt, etc.), but are as far as possible constant for each state. There is now a control called set_r for whether the result is written to r.r, which enables us to avoid setting opsel_b or opsel_r conditionally in some cases. Also, to avoid a data-dependent setting of msel_2 in IDIV_DODIV state, the IDIV_NR1 and IDIV_NR2 states have been reworked so that completion of the required number of iterations is checked in IDIV_NR1 state, and at that point, if the inverse estimate is < 0.5, we go to IDIV_USE0_5 state in order to use 0.5 as the estimate. This means that in the normal case, the inverse estimate is already in Y when we get to IDIV_DODIV state. IDIV_USE0_5 has been reworked to put R (which will contain 0.5) into Y as the inverse estimate. That means that IDIV_DODIV state doesn't have any data-dependent logic to put either P or R into Y. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago

1 2 3

107 Commits (09b340e845fc470e557a9c9a4c39bd2eaa54fac4)