Commit Graph

105 Commits (6fe4b549f5bd08461f5062bcd4572b254f407884)

Author SHA1 Message Date
Paul Mackerras 6fe4b549f5 FPU: Improve accuracy in multiply-add almost-cancellation cases
There are two paths for multiply-add instructions; one where the
product is larger or nearly the same as the addend, which does the
addition/subtraction in the multiplier with 128-bit accuracy; the
other is used when the addend is clearly larger, which shifts the
product right before doing the addition/subtraction in 64-bit
arithmetic.  The threshold for the second path is that B_exp has
to be greater than A_exp + C_exp + 1, the +1 being because the product
mantissa can be greater than 2.

This increases the +1 to +2 to make sure that the 128-bit path is used
when there is any chance of cancellation of the high-order bits of the
sum.  With the +1 threshold we could still get close to cancellation
when the mantissas of A and C were nearly 2 and the mantissa of B was
1.  This improves accuracy and avoids the need to do a 120-bit
subtraction in the second path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 80c81b58ef FPU: Generate correct result sign when B is denormal
If a subtraction A - B is done where A is in normalized form with an
exponent of -1022, and B is denormal, an inconsistency arises between
the comparison of the raw exponents in the first cycle, which sees
A.exp (0x001) > B.exp (0x000), and the comparison in DO_FADD state,
which sees r.a.exponent (-1022) = r.b.exponent (-1022).  Conseqently
we get r.add_bsmall = 0 and the subtraction is done the wrong way
around, yielding the wrong sign for the result.

Fix this by setting r.add_bsmall according to the comparison of raw
exponents in the first cycle and then using it in DO_FADD state.
Also add a test case for this.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras f631dcd700 FPU: Set FPRF correctly on multiply result that underflows
rcls_op being set to RCLS_TZERO was not detecting a zero result after
rounding for a multiply result that underflows, because S still had
low bits of the product.  To fix this, remove the 's_nz = 0' from the
RCLS_TZERO test.  We can't then use this test in the FMADD_6 state,
but we really shouldn't be testing for zero there, before rounding,
so remove that.  Also simplify FMADD_6 state by not setting rs_norm
and going always to FINISH state rather than going to NORMALIZE state.

Add a test for this case (actually a fmadd with B=0).

While here, remove a pointless assignment to f_to_multiply.valid in
MULT_1 state, since r.first is never set here.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras b122577a4e FPU: Be more careful about preserving low-order bits in multiply-add instrs
Add code to check whether bits of S which don't get shifted into R are
non-zero, and set X if they are, so that rounding in multiply-add
instructions works correctly.  This needs to be done after normalization
in the case of very small results, where potentially all the non-zero
bits in S do get shifted into R.

Also fix an incorrect test case, and add another multiply-add test case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 59992eab90 FPU: Avoid doing overflow processing twice in OE=1 case
Split the ROUND_OFLOW state into two, one which handles the OE=0 case
(disabled overflow exception) and one which handles the OE=1 case
(enabled overflow exception).  This avoids a loop in the state diagram
and prevents us from adding the exponent bias twice.

Also correct a bug in ROUNDING_3 state where for single-precision
operations which yield a result which is denormal in double-precision
format, r.shift was set wrongly.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras b8f7cbd894 FPU: Record bits shifted out of addend in fmadd-family instructions
If the addend is smaller than the product and thus needs to be shifted
right, record if any bits are lost from the right end in r.x, so that
the result gets rounded correctly.

Also add a test that checks one such case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 32919435a3 FPU: Allow mtfsb* to set FPSCR[FX] implicitly
If mtfsb1 causes an individual exception bit to go from 0 to 1, that
should set FX as well.  Arrange for this by setting update_fx to 1.
Also make sure mcrfs doesn't copy the reserved FPSCR bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras de71a6119c FPU: Make FPSCR bit 11 always read as 0
Bit 11 (52 in BE numbering) is a reserved bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 2f29daab2d FPU: Fix setting of r.x for single-precision operations
The fp_rounding function expects r.x to have been set based on the lower
31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1
(SP_RBIT is 31).  Also add a test to check.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 577bbb8f5d tests/fpu: Add test case for denorm input in frsp test
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 7b1febcbd3 tests/fpu: Check setting of FR and FI in FPSCR by frsp instruction
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 9326fc7f18 tests/modes: Test that mfspr/mtspr to unimplemented SPR in user mode causes HEAI
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 0255283159 tests/spr_read: Test that mfspr/mtspr to SPRs 0,4,5,6 generate HEAI
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras f2166d326c tests/fpu: Add a test for result writing being suppressed
When an arithmetic instruction generates an invalid operation
exception or a divide by zero exception, and that exception is enabled
in the FPSCR, the writing of the result to the destination register
should be suppressed, leaving whatever value was last written in the
destination.  Add a check that this occurs correctly, for the cases of
square root of a negative number (invalid operation exception) and
division by zero (zero divide exception).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 9f9f9046ee tests/spr_read: Add a check for no-op behaviour of mtspr and mfspr
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 months ago
Paul Mackerras 9ac71cfbf2 tests/fpu: Add more floating multiply-add tests
Add more tests to check that the result sign computations are correct.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 8f537c13bc tests: Add a test for the hash instructions hash{st,cmp}[p]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
11 months ago
Paul Mackerras 80bc9d5098 tests/trace: Add a few tests of DAWR (data watchpoint) functionality
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
11 months ago
Paul Mackerras 09de0738de tests/trace: Add checks for SIAR and SDAR being set correctly
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
11 months ago
Paul Mackerras 23b183fb16 tests/reservation: Check that SRR0 is set correctly on alignment interrupt
The tests that intentionally generate alignment interrupts now also
check that SRR0 is pointing to a l*arx or st*cx instruction.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
11 months ago
Paul Mackerras f64ab6569d tests/trace: Add a couple of tests of CIABR function
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
11 months ago
Paul Mackerras 140b930ad3 tests: Add tests for lq/stq, plq/pstq and lqarx/stqcx.
Lq and stq are tested in both BE and LE modes (though only 64-bit
mode) by the 'modes' test.

Lqarx and stqcx. are tested by the 'reservation' test in LE mode
(64-bit).

Plq and pstq are tested in 64-bit LE mode by the 'prefix' test.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras d2777dd1dd Generate Hypervisor Emulation Assistance Interrupt for illegal instructions
This implements the HEIR register (Hypervisor Emulation Instruction
Register) and arranges for an illegal instruction to cause a
Hypervisor Emulation Assistance Interrupt (HEAI) at vector 0xE40, and
set HEIR to the illegal instruction.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras 12a3d76217 Implement hrfid and make MSR[HV] always 1
Implementations without hypervisor/LPAR support are permitted by the
architecture, but should have MSR[HV] forced to be 1 at all times, not
0, and should implement various instructions and registers that are
only accessible in hypervisor mode.

This commit implements MSR[HV] as a constant 1 bit and adds the hrfid
instruction, which behaves exactly the same as rfid except that it
reads HSRR0/1 instead of SRR0/1.  We already have HSRR0/1 and HSPRG0/1
implemented.

When HV=1, Linux expects external interrupts to arrive as hypervisor
interrupts, so this adds support for hypervisor interrupts (i.e.,
those that set HSRR0/1) and makes the external interrupt be a
hypervisor interrupt.  (If we had an LPCR register, the LPES bit would
control this, but we don't.)  The xics test is updated to read HSRR0/1
after an external interrupt.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras 7f781b835d tests/fpu: Add tests for ftdiv and ftsqrt
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 7b86bf8863 tests/fpu: Add tests for fdiv and fre with denormalized operands
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 59a7996f1c tests/fpu: Add checks for correct setting of FPRF
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 7c5a2bcaf4 tests: Add a test for prefixed instructions
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 116f6281a9 tests: Update metavalues test count
With Paulus changes in PR #396 merged in 5c6d57de30, we can now reduce
the metavalue test counts.

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling 0073d23e73
Merge pull request #392 from paulusmack/fix-branch-alias
fetch1: Fix bug where BTC entries don't match on MSR[IR]
3 years ago
Anton Blanchard 25f93fc17e Add branch alias test
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 3c27abcc40 tests/trace: Test trace vs system call interrupt
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling eeac86c9d8 test: Add test for metavalues
Make sure they don't increase in future

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling 72fcca8e52 tests: Update FPU test output
The following commit added two tests but didn't update the tests
outputs:

    commit 73cc5167ec
    Author: Paul Mackerras <paulus@ozlabs.org>
    Date:   Mon May 9 19:18:42 2022 +1000
    Use FPU for division instructions if we have an FPU

This patch updates these using tests/update_console_tests

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling 281a125f1f
Merge pull request #379 from paulusmack/master
Lots of improvements
3 years ago
Michael Neuling a060ad5085 tests/pmu: Cleanup whitespace in pmc.c
Fixup tabs vs space and trailing whitespace.

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Paul Mackerras 73cc5167ec Use FPU for division instructions if we have an FPU
- Arrange for XER to be written for OE=1 forms
- Arrange for condition codes to be set for RC=1 forms
  (including correct handling for 32-bit mode)
- Don't instantiate the divider if we have an FPU.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras c9e838b656 Remove support for lq, stq, lqarx and stqcx.
They are optional in SFFS (scalar fixed-point and floating-point
subset), are not needed for running Linux, and add complexity, so
remove them.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Iago Caran Aquino de1bf10114 tests/pmu: Add load/store completed, instruction count and cycle count tests
Signed-off-by: Iago Caran Aquino <iago.caran@gmail.com>
3 years ago
Anton Blanchard 2d142a6c01 tests/misc: Add a store/dcbz test
We have a bug where an store near a dcbz can cause the dcbz to only zero
8 bytes. Add a test case for this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Anton Blanchard 00259458c7 tests/misc: Add an icbi test
We have a bug where an icbi can cause an instruction to execute twice.
Add a test case for this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Paul Mackerras ba34914465 tests/misc: Add a test for a load that hits two preceding stores
This checks that the store forwarding machinery in the dcache
correctly combines forwarded stores when they are partial stores
(i.e. only writing part of the doubleword, as for a byte store).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras f40842d9b2 tests/fpu: Test FPU unavailable interrupt following a load
This adds a load before a floating-point load which should generate a
floating-point unavailable interrupt, to test for the bug where
unavailability interrupts can get dropped while loadstore1 is
executing instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras 18120f153d MMU: Implement a vestigial partition table
This implements a 1-entry partition table, so that instead of getting
the process table base address from the PRTBL SPR, the MMU now reads
the doubleword pointed to by the PTCR register plus 8 to get the
process table base address.  The partition table entry is cached.

Having the PTCR and the vestigial partition table reduces the amount
of software change required in Linux for Microwatt support.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard d26a157cd7 Add a test to read from all SPRs
Make sure the SPRs are initialized and we can't read X state.

(Mikey: rebased and added console/bin file for testing)

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Paul Mackerras ec5730a75a tests: Add tests for lq/stq and lqarx/stqcx.
Lq and stq are tested in both BE and LE modes (though only 64-bit
mode) by the 'modes' test.

Lqarx and stqcx. are tested by the 'reservation' test in LE mode mode
(64-bit).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 29fabeb12e tests/misc: Add a test for correct CTR and LR updating by branches
This adds a test with a bdnzl followed immediately by a bdnz, to check
that CTR and LR both get evaluated and written back correctly in this
situation.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 144433218f tests/trace: Test trace interrupt vs. FP unavailable interrupt
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras dc1544db69 FPU: Implement floating multiply-add instructions
This implements fmadd, fmsub, fnmadd, fnmsub and their
single-precision counterparts.  The single-precision versions operate
the same as the double-precision versions until the final rounding and
overflow/underflow steps.

This adds an S register to store the low bits of the product.  S
shifts into R on left shifts, and can be negated, but doesn't do any
other arithmetic.

This adds a test for the double-precision versions of these
instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras c350bc1f25 FPU: Implement fsqrt[s] and add a test for fsqrt
This implements the floating square-root calculation using a table
lookup of the inverse square root approximation, followed by three
iterations of Goldschmidt's algorithm, which gives estimates of both
sqrt(FRB) and 1/sqrt(FRB).  Then the residual is calculated as
FRB - R * R and that is multiplied by the 1/sqrt(FRB) estimate to get
an adjustment to R.  The residual and the adjustment can be negative,
and since we have an unsigned multiplier, the upper bits can be wrong.
In practice the adjustment fits into an 8-bit signed value, and the
bottom 8 bits of the adjustment product are correct, so we sign-extend
them, divide by 4 (because R is in 10.54 format) and add them to R.

Finally the residual is calculated again and compared to 2*R+1 to see
if a final increment is needed.  Then the result is rounded and
written back.

This implements fsqrts as fsqrt, but with rounding to single precision
and underflow/overflow calculation using the single-precision exponent
range.  This could be optimized later.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago