Commit Graph

1516 Commits (09b340e845fc470e557a9c9a4c39bd2eaa54fac4)
 

Author SHA1 Message Date
Paul Mackerras 09b340e845 FPU: Update committed FPSCR value correctly
The committed FPSCR is updated in the cycle where an FPU instruction
signals completion.  Since we update the FPRF field in the FPSCR in
that same cycle, the value put into r.comm_fpscr needs to include
the new FPRF value.  Otherwise, a subsequent flush (for example,
due to the following instruction being an illegal instruction that
has to be emulated) will drop the FPSCR update.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 1ad8848655 FPU: Improve zero result detection and simplify final states
This improves detection of results that are exactly zero in FINISH state
by noting that on entry to FINISH state, if R is zero then X must also
be zero, so no rounding needs to be done and no underflow exists.
Therefore we can set rcls_op = RCLS_TZERO to test for zero and exit
early if R = 0.  The RCLS_TZERO test now tests the whole of R just in
case.

The rest of the following states have been streamlined and simplified.
In cases of underflow, we only need to take action before rounding in
the UE=0 case (disabled underflow exception), where we need to denormalize
before rounding.  For enabled underflow cases we just use the existing
NORMALIZE state, which lets us remove NORM_UFLOW state.

On entry to ROUNDING state, R can be zero or denorm only for round to
integer instructions (fri*) or for disabled underflow exception cases.
Note that in case of underflow with UE=0, the exception is only actually
signalled if there is loss of accuracy, i.e. if FPSCR[FI] will be set.
This is now done at the end of ROUNDING state.  For underflow with UE=1,
we go to a new ROUND_UFLOW_EN state to adjust the exponent from
ROUNDING, ROUNDING_2 or ROUNDING_3 state.

In the ROUNDING* states, we avoid shifting left to normalize a result
with exponent <= -1022, because if we did we would then just need to
denormalize again.  This lets us get rid of DENORM state.

Finally, noticing that DO_FRSP_2 state does much the same as FINISH
state lets us remove DO_FRSP_2 state and go to FINISH state from
DO_FRSP.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras f8a11420ca FPU: Check for rounding overflow in 32-bit convert-to-integer operations
Without this, rounding a value of 0xFFFFFFFF up, giving 0x100000000, will
yield an incorrect result of zero.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 6fe4b549f5 FPU: Improve accuracy in multiply-add almost-cancellation cases
There are two paths for multiply-add instructions; one where the
product is larger or nearly the same as the addend, which does the
addition/subtraction in the multiplier with 128-bit accuracy; the
other is used when the addend is clearly larger, which shifts the
product right before doing the addition/subtraction in 64-bit
arithmetic.  The threshold for the second path is that B_exp has
to be greater than A_exp + C_exp + 1, the +1 being because the product
mantissa can be greater than 2.

This increases the +1 to +2 to make sure that the 128-bit path is used
when there is any chance of cancellation of the high-order bits of the
sum.  With the +1 threshold we could still get close to cancellation
when the mantissas of A and C were nearly 2 and the mantissa of B was
1.  This improves accuracy and avoids the need to do a 120-bit
subtraction in the second path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 80c81b58ef FPU: Generate correct result sign when B is denormal
If a subtraction A - B is done where A is in normalized form with an
exponent of -1022, and B is denormal, an inconsistency arises between
the comparison of the raw exponents in the first cycle, which sees
A.exp (0x001) > B.exp (0x000), and the comparison in DO_FADD state,
which sees r.a.exponent (-1022) = r.b.exponent (-1022).  Conseqently
we get r.add_bsmall = 0 and the subtraction is done the wrong way
around, yielding the wrong sign for the result.

Fix this by setting r.add_bsmall according to the comparison of raw
exponents in the first cycle and then using it in DO_FADD state.
Also add a test case for this.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras f631dcd700 FPU: Set FPRF correctly on multiply result that underflows
rcls_op being set to RCLS_TZERO was not detecting a zero result after
rounding for a multiply result that underflows, because S still had
low bits of the product.  To fix this, remove the 's_nz = 0' from the
RCLS_TZERO test.  We can't then use this test in the FMADD_6 state,
but we really shouldn't be testing for zero there, before rounding,
so remove that.  Also simplify FMADD_6 state by not setting rs_norm
and going always to FINISH state rather than going to NORMALIZE state.

Add a test for this case (actually a fmadd with B=0).

While here, remove a pointless assignment to f_to_multiply.valid in
MULT_1 state, since r.first is never set here.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras b122577a4e FPU: Be more careful about preserving low-order bits in multiply-add instrs
Add code to check whether bits of S which don't get shifted into R are
non-zero, and set X if they are, so that rounding in multiply-add
instructions works correctly.  This needs to be done after normalization
in the case of very small results, where potentially all the non-zero
bits in S do get shifted into R.

Also fix an incorrect test case, and add another multiply-add test case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 59992eab90 FPU: Avoid doing overflow processing twice in OE=1 case
Split the ROUND_OFLOW state into two, one which handles the OE=0 case
(disabled overflow exception) and one which handles the OE=1 case
(enabled overflow exception).  This avoids a loop in the state diagram
and prevents us from adding the exponent bias twice.

Also correct a bug in ROUNDING_3 state where for single-precision
operations which yield a result which is denormal in double-precision
format, r.shift was set wrongly.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 9f27f60b26 FPU: Clear FPSCR[FR,FI] on overflow in convert-to-integer instructions
Also simplify INT_CHECK state by going to INT_OFLOW on overflow.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 37edba4da7 FPU: Normalize B operand for multiply-add instructions
Otherwise the result can get rounded incorrectly when B is denorm but
the A * C product is much smaller.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras d33f31509b FPU: Clear S in ADD_SHIFT state
Otherwise, if this is a multiply-add instruction and the result needs
to be shifted left, bits of the product in S will contaminate the
final result.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras b8f7cbd894 FPU: Record bits shifted out of addend in fmadd-family instructions
If the addend is smaller than the product and thus needs to be shifted
right, record if any bits are lost from the right end in r.x, so that
the result gets rounded correctly.

Also add a test that checks one such case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 009ee1c9c5 FPU: Renormalize frsp operand if denormalized
This arranges for the frsp operand to be renormalized if necessary.
Without this, we can incorrectly get X set to 1 for denormalized
operands, and hence the rounding may be done incorrectly.  To make
things clearer, we now have an explicit flag indicating when the B
operand needs to be in normalized form.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras baf8f5f8c6 FPU: Force reserved FPSCR bit 11 to zero
This ensures that the reserved FPSCR bit can never be set, by clearing
it at the end of the fpu_1 process.

Also remove a redundant setting of cr_result in the mcrfs code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras a18c462b27 FPU: Ignore stale P contents in short-circuit multiply-add
When a multiply-add is done with A or C equal to zero, the actual
multiplication operation is not done, hence P is not valid, so in
FINISH state we shouldn't set X based on P being non-zero.  Fix this
by clearing the is_multiply flag in the short-circuit case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 41988e3b5f FPU: Fix comparison of remainder in square root code
The square root procedure needs to compare B - R^2 with 2R + 1 to
decide whether to increment the square root estimate R by 1.  It
currently does this by putting 2R + 1 in B and using the pcmpb_lt
and pcmpb_eq signals.  This is not correct because the comparisons
that generate those signals have a 2-bit shift embedded into them.
Instead, put 2R + 1 into C and use pcmpc_lt/eq, which don't have
the 2-bit shift.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras f3b9566ae2 FPU: Round to single precision for fcfid[u]s
The fcfids and fcfidus instructions weren't rounding to single
precision because r.longmask wasn't getting set.  To fix this, set
v.longmask to e_in.single for the fcfid* instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras e5651e2eab FPU: Avoid adding bias twice in UE=1 underflow case
In case of underflow with UE=1, ROUND_UFLOW state adds the exponent
bias and then goes to NORMALIZE state if the value is not normalized.
Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is
still tiny, resulting in the bias getting added twice.  To avoid this,
if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW
state which does the normalization and goes to ROUNDING state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras a0755935f4 FPU: Normalize B for fmadd family instructions
If B is denormalized, but the A*C product is much smaller, then the
result is B; in the UE=1 case we need to normalize the result, and the
left shift to do that can bring in low-order product bits from S and
corrupt the result.  To avoid this, make sure B is normalized.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 32919435a3 FPU: Allow mtfsb* to set FPSCR[FX] implicitly
If mtfsb1 causes an individual exception bit to go from 0 to 1, that
should set FX as well.  Arrange for this by setting update_fx to 1.
Also make sure mcrfs doesn't copy the reserved FPSCR bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras e471581222 FPU: Do result processing on denorm short-circuit results when FPSCR[UE] is set
Results that are tiny (i.e., in the denorm range) need special
processing when underflow exceptions are enabled, including in the
cases where the result is just one of the input operands, such as for
a fmadd with A or C equal to zero.  To make sure this gets done, go to
FINISH state rather than returning the relevant input operand as the
result.  The same logic is now used when the result needs to be rounded
to single precision.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 0478fe41dd FPU: Reset FPSCR[FR,FI] at beginning of fcfid*
Otherwise a non-zero setting from a previous instruction won't get cleared.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras f252dba43d FPU: Only apply zero subtraction result sign rule when result is exactly zero
The rule in the ISA about the sign of the result of a subtraction when
the magnitude of the result is zero only applies when the operands are
equal in magnitude but opposite in sign, i.e. when the result is exactly
zero.  Add a check using FPSCR[FI] to exclude the cases where the exact
result is non-zero but gets truncated to zero by rounding.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 8a204f1058 FPU: Set FPSCR exception summary based on individual invalid exception bits
Rather than setting FPSCR[FX] to 1 when FPSCR[VX] transitions from 0 to 1,
this sets it when any of the individual invalid exception bits (VSXNAN,
VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) transitions from
0 to 1.  This better matches the ISA and P9 behaviour.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras fb71f62b83 FPU: Round finite special-case results to single precision if required
When a special case is detected, such as a zero operand to an add,
and the operation is a single-precision operation such as fadds,
we need to round the result to single precision instead of just
returning the relevant input operand unmodified.  This accomplishes
that by going to DO_FRSP_2 state from the special-case code for
single-precision operations that return a finite floating-point
result.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras de71a6119c FPU: Make FPSCR bit 11 always read as 0
Bit 11 (52 in BE numbering) is a reserved bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras ca792f3b13 FPU: Make convert-to-integer-word instructions behave like P9
The fctiw* instructions return a copy of the value in bits 31..0 in
bits 63..32 of the result on P9, rather than a sign or zero extension
of the word result.  Make the FPU do the same.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 82825a11ba FPU: Set result sign correctly for denorm +/- 0 case
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 37b1afc7f7 FPU: Make fri* instructions set FPSCR[FR,FI] to zero
As required by the ISA.  Also, never generate an inexact exception.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras dcd85164c6 FPU: Make fsel not alter FPSCR
fsel is a move-type instruction, and hence shouldn't affect FPSCR.
Set v.writing_fpr and v.instr_done, rather than setting arith_done,
to achieve this.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 066e38b8ea FPU: Do proper over/underflow handling for single-precision [fm]add
The ADD_3 state incorporated some of the logic of the FINISH state, but
in some cases assumed the result couldn't overflow or underflow - which
is not true for single precision operations, if the input operands are
outside the single precision range.  Fix this, and simplify things, by
having ADD_3 always go to FINISH state, which does the full overflow and
underflow checking.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras d540171f60 FPU: Ignore Rc bit for mffs* variants other than plain mffs
Bit 0 of the instruction is Rc for mffs but reserved for the other
mffs* instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 0e11f80f2f FPU: Set FPSCR[FPRF] to zero for convert to integer operations
This seems to be what P9 does.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 2f29daab2d FPU: Fix setting of r.x for single-precision operations
The fp_rounding function expects r.x to have been set based on the lower
31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1
(SP_RBIT is 31).  Also add a test to check.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 577bbb8f5d tests/fpu: Add test case for denorm input in frsp test
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras ab3783b61b FPU: Fix setting of r.x
Having computed rormr, use it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 7b1febcbd3 tests/fpu: Check setting of FR and FI in FPSCR by frsp instruction
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras e60840eabc FPU: Make sure FR and FI in FPSCR get reset on special-case arith instructions
Arithmetic instructions where the result is determined without doing any
actual computation (i.e. the input(s) are NaNs, infinities, zeroes etc.)
weren't resetting FR and FI properly.  This combines the two blocks that
handle the r.cycle_1_ar = 1 case to fix it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 0b3df8ab00
bitsort: Fix bperm instruction (#456)
The result byte needs to be zero when the index byte value is >= 64.

Fixes: 23ff954059 ("core: Change bperm to a simpler and slower implementation", 2025-01-07)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras da695e7927
execute1: Fix bug where LPCR[HEIC] disabled interrupts in problem state (#453)
LPCR[HEIC] should only disable external interrupts in hypervisor mode,
and not in problem state (user mode).  This fixes the expression for
irq_valid to do that.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras fabe9a4feb
Merge pull request #452 from paulusmack/master
SPR and interrupt bug fixes, implement LPCR[EVIRT], plus logic/timing improvements
3 months ago
Paul Mackerras 79e69d2a23 execute2: Simplify execute2 logic to improve timing
This aims to simplify the logic in the execute2_1 process.  It is not
really necessary to preserve the contents of ex2 when stalled, except
for ex2.e.last_nia; but when stalled, bits which would initiate
downstream actions, such as ex2.e.valid, ex2.e.interrupt and ex2.se,
should be cleared.

Also, the path through stage2_stall to the bypass valid signal has
shown up as a critical path.  This dependency is there because the
mfspr instruction to a slow SPR or a PMU SPR should not forward a
result before the instruction is about to complete, because the result
might change (for example when reading the timebase).  To avoid this
dependency, we simply don't forward results for mfspr to slow/PMU
SPRs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 9326fc7f18 tests/modes: Test that mfspr/mtspr to unimplemented SPR in user mode causes HEAI
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 0255283159 tests/spr_read: Test that mfspr/mtspr to SPRs 0,4,5,6 generate HEAI
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 5548a5ba26 execute1: Make mfspr/mtspr to SPRs 0,4,5,6 generate HEAI
The ISA specifies that mfspr or mtspr to SPR 0, 4, 5 or 6 should
generate a hypervisor emulation assistance interrupt in privileged
mode, so this adds logic to do that.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 9c40ddffd2 execute1: Implement LPCR[EVIRT] bit
This implements the EVIRT bit in the LPCR register.  When set to 1,
EVIRT causes mfspr and mtspr to an undefined SPR number in privileged
mode (i.e. hypervisor mode) to cause a hypervisor emulation assistance
interrupt.  When set to 0, such instructions are executed as no-ops.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 1d758f1d74 execute1: Simplify no-op behaviour of mfspr
When mfspr is performed to one of the reserved no-op SPRs, or to an
undefined SPR in privileged state, the behaviour is a no-op, that is,
the destination register is not written.  Previously this was done by
writing back the same value that the register had before the
instruction, but in fact it can be done simply by negating the write
enable signal so that the result GPR is not written.  This gives a
small reduction in logic complexity.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 788f7a1755 core: Improve timing on bypass control paths
In order to improve timing, the bypass paths now carry the register
number being written as well as the tag.  The decisions about which
bypasses to use for which operands are then made by comparing the
register numbers rather than by determining a tag from the register
number and then comparing tags.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras f2166d326c tests/fpu: Add a test for result writing being suppressed
When an arithmetic instruction generates an invalid operation
exception or a divide by zero exception, and that exception is enabled
in the FPSCR, the writing of the result to the destination register
should be suppressed, leaving whatever value was last written in the
destination.  Add a check that this occurs correctly, for the cases of
square root of a negative number (invalid operation exception) and
division by zero (zero divide exception).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 34cf092bf6 control: Fix forwarding when previous result write is suppressed
If we have two successive instructions that write the same result
register and then a third that uses the same register as an input, and
the second instruction suppresses the write of its result, we can
currently end up with the third instruction using the wrong value,
because it uses the register value from before the first instruction
rather than the result of the first instruction.  (An example of an
instruction suppressing the write of its result is a floating-point
instruction that generates an enabled invalid operation exception but
not an interrupt.)

To fix this, the control module now uses any forwarded value for the
register we want, not just the most recent value, but still stalls
until it has the most recent value, or the previous instruction
completes.  Thus in the case described above, decode2 will have
latched the value from the first instruction and so the third
instruction gets the correct value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago