Commit Graph

1541 Commits (master)
 

Author SHA1 Message Date
Paul Mackerras efd0571b5f
Merge pull request #461 from paulusmack/master
Improvements for the Arty A7 board
2 days ago
Paul Mackerras 81792f599b arty a7: Connect SD card interface to microSD socket on LCD touchscreen board
If the generic USE_LCD is false, the first SD card controller (mmcblk0
in Linux) is connected to pmod HA; if USE_LCD is true, it is connected
to the SD card slot on the touchscreen/LCD panel.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras 185008c907
Merge pull request #460 from paulusmack/fixes
Fix icache and dcache bugs

- Fix icache bug causing spurious ISI interrupts
- Fix dcache bug causing corrupted load data
3 days ago
Paul Mackerras c7531e592c arty a7: Add facilities to get A/D conversions from the touchscreen
This adds connections from the A2 - A5 inputs on the Arty A7 to the
XADC module in the Artix-7 plus a way for software to access the XADC
via its DRP port, and a status register to tell software when
conversion sequences are done.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras 172eae61cb arty a7: Add an interface for a TFT LCD touchscreen
This adds an interface for an Arduino-compatible LCD touchscreen.  The
screen module plugs directly on to the Arduino/chipKit shield
connector on the Arty A7.  Unfortunately, the slightly strange way the
resistive touchscreen is brought out (connected to the D0, D1, RS and
CS pins) combined with the 200 ohm protection resisters on the Arty
board mean that some hardware hacks to the module are necessary.  I
rewired mine so that D0 and D1 are on the A4 and A5 pins and the reset
is where D0 was (shield I/O 8).

This interface is suitable for boards with a HX8347 driver chip.  The
timing may not be quite suitable for other driver chips.

The interface is a byte which can be read and written at 0xc8050000,
containing an index register, and a 1-8 byte data register at
0xc8050008.  Reading at offsets 1 to 7 from those addresses yields the
same value as at offset 0.  Writing 64 bits to the data register
writes the bytes at offset 1, 0, 3, 2, 5, 4, 7, 6 in that order to the
driver chip.  This allows pixel data to be transferred using 64-bit
writes, ending up in the frame buffer in the expected order (for
16-bit pixels, the driver chip expects MS byte then LS byte).  32-bit
writes do 1, 0, 3, 2, and 16-bit writes do 1, 0.

The touchscreen support so far is a 1-byte register containing bits to
set RS, D0, D1 and CS high or low or make them tri-state.  There is
nothing to do analog conversions of the signal levels at this stage.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras 7f4e0185b5 xilinx_mult: Eliminate a Vivado warning
Since the p1 instance of DSP48 has CREG = 0, we should ground the CEC
input, as mentioned in a Vivado warning.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras d4fec95044 arty a7: Turn on LED 5 when SD card command-done interrupt is enabled
This snoops writes to the interrupt enable registers of the SD card
interfaces and records whether the command-done interrupt is enabled.
LED 5 is turned on whenever either interface has this interrupt enabled
in order to serve as a disk activity indicator.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras dcd1072c25 arty a7: Put the top 8 GPIOs on pmod B
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras 6100e7b50e dcache: Fix another dcache bug causing occasional load data corruption
Commit a7420c2a4d ("dcache: Fix bug causing load to return incorrect
data", 2025-12-27) fixed the main cause of the bug, but left a 1-cycle
window where the same problem could still occur.  If a touch that misses
in the dcache is followed immediately by a load to a different cache
line with the same index, then because the touch is completed and the
new tag is written for the line being touched in the same cycle, it is
possible for the following load to use the previous (stale) tag value
for the line.  If that old value matches the load (i.e., the load would
have been a hit in the absence of the touch) then the load will
incorrectly return data from the line being touched.

Fix this by delaying the completion of the touch until after the new
tag has been written, which is indicated by r1.write_tag = 0.

Fixes: a7420c2a4d ("dcache: Fix bug causing load to return incorrect data", 2025-12-27)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 days ago
Paul Mackerras a1d83ba91a tests/mmu: Add a test for a faulting load near the end of a page
This tests for the bug where a load near the end of a page, if the load
faults and the following page isn't mapped, could cause a DSI followed
incorrectly by an ISI shortly afterwards.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 days ago
Paul Mackerras 41e341f260 icache: Clear fetch failed flag on flush
This fixes a bug where a load that results in a DSI, if it is placed
near the end of a page and the following page isn't mapped, can
result in the core starting to take the DSI but then jumping off to
the ISI vector.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 days ago
Paul Mackerras 16c3eda1b1 arty a7: Rework status LED colours
This frees up LEDs 4 and 5 by combining their status functions into
LED 0, which is now black when the system is in reset and yellow when
the system clock is not locked.  On configuations without litedram,
LED 0 now shows green rather than magenta.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 weeks ago
Paul Mackerras 90df07b950 arty a7: Add connection to i2c RTC chip on port JD
The I2C data is on GPIO 22 and the clock is on GPIO 23.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 weeks ago
Paul Mackerras 4f06a01731 arty a7: Add a second SD card interface on pmod JC
This adds a second SD card interface.  The main complexity is in
providing a wishbone switch/arbiter to multiplex the two DMA
wishbones from the two interfaces to a single wishbone going to
the soc module.  There is a new syscon info reg bit to indicate the
presence of the second litesdcard.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 weeks ago
Paul Mackerras 6366fbb5a7 arty a7: Simplify GPIO connections
Currently, GPIO lines 0 - 8 drive three of the 3-colour LEDs on
output, but on input read the state of the pins labelled IO10 - IO13,
IO26 - IO29 and IO8 on the Arty board.  Then GPIO lines 10 - 17 drive
IO10 - IO13 and IO26 - IO29 on output, but on input read the 4 buttons
and 4 switches.  To simplify all this and prepare for future changes,
this just detaches IO8, IO13 - IO13 and IO26 - IO29, so now GPIO 0 - 8
read 0 on input, and GPIO 10 - 17 do nothing on output.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 weeks ago
Paul Mackerras 8339795d0c
Merge pull request #459 from paulusmack/fixes
Bug fixes for the FPU and dcache
2 weeks ago
Paul Mackerras 6eaf22ea95 dcache: Fix stalls that occurred occasionally with dcbt followed by ld
This fixes a race condition that causes a hang in a situation where the
program does a dcbt to a cache line, then hits a TLB miss causing some
requests to come in to the dcache from the MMU while the cache line
requested by the dcbt has not yet started to come in, then does a load
to an address in the same cache line requested by the dcbt.  If it
happens that the data for the load arrives in the same cycle that the
load is doing the cache tag and TLB lookups, the dcache_slow process
correctly recognizes that the request can be satisfied immediately
but incorrectly sends the done signal to the MMU rather than loadstore1,
because the logic looks at r1.mmu_req not req.mmu_req.  Fix it to use
req.mmu_req.

Also make sure that RELOAD_WAIT_ACK state only completes a touch that
was the one that caused entry to RELOAD_WAIT_ACK state, not a
subsequent touch, which will have r1.req.hit_reload = 0.  (A touch to
the same line that is already being reloaded would be treated as a hit.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 weeks ago
Paul Mackerras fdd98d88d4 FPU: Fix zero result detection in fmadd-family instructions
With the multiply-add instructions, it is possible to get into state
FMADD_6 with R containing a value >= 8.0.  If the value is exactly
8.0, the logic will incorrectly conclude that the result is zero
because it only tests bits up to UNIT_BIT + 2.  Fix this by testing
up to UNIT_BIT + 3, and add a test case to the FPU test that triggers
this situation.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 weeks ago
Paul Mackerras d02e8e6f93
Merge pull request #458 from paulusmack/fixes
Fixes for bugs found in dcache, loadstore1 and execute1.
4 weeks ago
Paul Mackerras 84eebf5c7c execute1: Fix bug causing SRR0 to be set to 4 more than the correct value
If an scv (or sc) instruction is executed and an asynchronous
interrupt occurs on the following instruction (e.g. the first
instruction of the scv handler), the address written to SRR0 will be
the address of that following instruction + 4.  The reason is that
ex1.advance_nia will still be set from the execution of the sc[v].
Fix this by clearing v.advance_nia in execute1_1.

(This only shows up for asynchronous interrupts with scv, not sc,
because sc clears MSR[EE].  It should show up for synchronous
interrupts with both sc and scv, but that has not been demonstrated.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 month ago
Paul Mackerras aadd22267f execute1: Don't increment the LOG_ADDR SPR after reading it
Reading the LOG_DATA SPR is supposed to increment the log address,
and reading LOG_ADDR is not supposed to, but currently this is the
wrong way around.  Fix it.  Also add a related comment.

Fixes: 8f7326a824 ("core: Implement various SPRs which read zero and ignore writes", 2025-04-10)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 month ago
Paul Mackerras a7420c2a4d dcache: Fix bug causing load to return incorrect data
If a touch is immediately followed by a load to a different address
which has the same index as the touch address, and both are cache
misses, it is possible for the load to be treated as if it is to the
same cache line as the touch, and thus return data from the line being
touched rather than the line being loaded from.  For example, if the
touch is to 0x1c20 and the load is to 0x2c20, and the state left in
r1.store_ways by an earlier operation happens to match the PLRU victim
way, the load will return data from 0x1c20.

This happens because the touch completes immediately, meaning that the
load gets processed before r1.store_ways and the cache tag for the
line being touched have been set correctly, leading to a chance that
the load can match when it shouldn't (or not match when it should).
To fix this, complete the touch after one cycle, in RELOAD_WAIT_ACK
state, rather than immediately.

Also, for touches, consider hit_reload = 1 equivalent to a cache hit.
If the line is being reloaded then the touch doesn't need to do
anything.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 month ago
Paul Mackerras c78d9b32ef loadstore1: Ensure tlbie instructions get completed
Since commit c938246cc8 ("dcache: Simplify addressing of the dcache
TLB", 2025-04-05), tlbie instructions have been sent down the
loadstore pipe with both req.dc_req and req.mmu_op set, so that the
tlbie gets sent both to the data cache and the MMU.  This is so that
the relevant TLB hit signals are set correctly in the dcache for a
single-page invalidation.  However, this means that loadstore1 was not
sending a completion to writeback for the tlbie.  Normally this
doesn't cause a problem, but if the tlbie is followed by an
instruction that is marked 'single-pipe' in the decode1 tables, such
as sync (any variant), decode2 will then stall forever waiting for the
tlbie to complete before issuing the following instruction.

To fix this, clear req.dc_req in the second loadstore stage for a
tlbie (actually for any MMU operation, but tlbie is the only
instruction that would have dc_req set).

Fixes: c938246cc8 ("dcache: Simplify addressing of the dcache TLB", 2025-04-05)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 month ago
Paul Mackerras f9dc3ecdc8 execute1: Correct FSCR[IC] value for prefix unavailable interrupt
FSCR[IC] should be set to 13 for a prefix unavailable interrupt, not 11.
To avoid this type of mistake, use the same symbols for setting IC as
for the bit numbers in the rest of FSCR.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 month ago
Paul Mackerras a1624a50da
Merge pull request #457 from paulusmack/fixes
FPU fixes, mostly for bugs found by comparing results from random instruction
sequences (generated by simple_random) with POWER9.
1 month ago
Paul Mackerras 09b340e845 FPU: Update committed FPSCR value correctly
The committed FPSCR is updated in the cycle where an FPU instruction
signals completion.  Since we update the FPRF field in the FPSCR in
that same cycle, the value put into r.comm_fpscr needs to include
the new FPRF value.  Otherwise, a subsequent flush (for example,
due to the following instruction being an illegal instruction that
has to be emulated) will drop the FPSCR update.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 1ad8848655 FPU: Improve zero result detection and simplify final states
This improves detection of results that are exactly zero in FINISH state
by noting that on entry to FINISH state, if R is zero then X must also
be zero, so no rounding needs to be done and no underflow exists.
Therefore we can set rcls_op = RCLS_TZERO to test for zero and exit
early if R = 0.  The RCLS_TZERO test now tests the whole of R just in
case.

The rest of the following states have been streamlined and simplified.
In cases of underflow, we only need to take action before rounding in
the UE=0 case (disabled underflow exception), where we need to denormalize
before rounding.  For enabled underflow cases we just use the existing
NORMALIZE state, which lets us remove NORM_UFLOW state.

On entry to ROUNDING state, R can be zero or denorm only for round to
integer instructions (fri*) or for disabled underflow exception cases.
Note that in case of underflow with UE=0, the exception is only actually
signalled if there is loss of accuracy, i.e. if FPSCR[FI] will be set.
This is now done at the end of ROUNDING state.  For underflow with UE=1,
we go to a new ROUND_UFLOW_EN state to adjust the exponent from
ROUNDING, ROUNDING_2 or ROUNDING_3 state.

In the ROUNDING* states, we avoid shifting left to normalize a result
with exponent <= -1022, because if we did we would then just need to
denormalize again.  This lets us get rid of DENORM state.

Finally, noticing that DO_FRSP_2 state does much the same as FINISH
state lets us remove DO_FRSP_2 state and go to FINISH state from
DO_FRSP.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras f8a11420ca FPU: Check for rounding overflow in 32-bit convert-to-integer operations
Without this, rounding a value of 0xFFFFFFFF up, giving 0x100000000, will
yield an incorrect result of zero.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 6fe4b549f5 FPU: Improve accuracy in multiply-add almost-cancellation cases
There are two paths for multiply-add instructions; one where the
product is larger or nearly the same as the addend, which does the
addition/subtraction in the multiplier with 128-bit accuracy; the
other is used when the addend is clearly larger, which shifts the
product right before doing the addition/subtraction in 64-bit
arithmetic.  The threshold for the second path is that B_exp has
to be greater than A_exp + C_exp + 1, the +1 being because the product
mantissa can be greater than 2.

This increases the +1 to +2 to make sure that the 128-bit path is used
when there is any chance of cancellation of the high-order bits of the
sum.  With the +1 threshold we could still get close to cancellation
when the mantissas of A and C were nearly 2 and the mantissa of B was
1.  This improves accuracy and avoids the need to do a 120-bit
subtraction in the second path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 80c81b58ef FPU: Generate correct result sign when B is denormal
If a subtraction A - B is done where A is in normalized form with an
exponent of -1022, and B is denormal, an inconsistency arises between
the comparison of the raw exponents in the first cycle, which sees
A.exp (0x001) > B.exp (0x000), and the comparison in DO_FADD state,
which sees r.a.exponent (-1022) = r.b.exponent (-1022).  Conseqently
we get r.add_bsmall = 0 and the subtraction is done the wrong way
around, yielding the wrong sign for the result.

Fix this by setting r.add_bsmall according to the comparison of raw
exponents in the first cycle and then using it in DO_FADD state.
Also add a test case for this.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras f631dcd700 FPU: Set FPRF correctly on multiply result that underflows
rcls_op being set to RCLS_TZERO was not detecting a zero result after
rounding for a multiply result that underflows, because S still had
low bits of the product.  To fix this, remove the 's_nz = 0' from the
RCLS_TZERO test.  We can't then use this test in the FMADD_6 state,
but we really shouldn't be testing for zero there, before rounding,
so remove that.  Also simplify FMADD_6 state by not setting rs_norm
and going always to FINISH state rather than going to NORMALIZE state.

Add a test for this case (actually a fmadd with B=0).

While here, remove a pointless assignment to f_to_multiply.valid in
MULT_1 state, since r.first is never set here.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras b122577a4e FPU: Be more careful about preserving low-order bits in multiply-add instrs
Add code to check whether bits of S which don't get shifted into R are
non-zero, and set X if they are, so that rounding in multiply-add
instructions works correctly.  This needs to be done after normalization
in the case of very small results, where potentially all the non-zero
bits in S do get shifted into R.

Also fix an incorrect test case, and add another multiply-add test case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 59992eab90 FPU: Avoid doing overflow processing twice in OE=1 case
Split the ROUND_OFLOW state into two, one which handles the OE=0 case
(disabled overflow exception) and one which handles the OE=1 case
(enabled overflow exception).  This avoids a loop in the state diagram
and prevents us from adding the exponent bias twice.

Also correct a bug in ROUNDING_3 state where for single-precision
operations which yield a result which is denormal in double-precision
format, r.shift was set wrongly.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 9f27f60b26 FPU: Clear FPSCR[FR,FI] on overflow in convert-to-integer instructions
Also simplify INT_CHECK state by going to INT_OFLOW on overflow.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 37edba4da7 FPU: Normalize B operand for multiply-add instructions
Otherwise the result can get rounded incorrectly when B is denorm but
the A * C product is much smaller.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras d33f31509b FPU: Clear S in ADD_SHIFT state
Otherwise, if this is a multiply-add instruction and the result needs
to be shifted left, bits of the product in S will contaminate the
final result.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras b8f7cbd894 FPU: Record bits shifted out of addend in fmadd-family instructions
If the addend is smaller than the product and thus needs to be shifted
right, record if any bits are lost from the right end in r.x, so that
the result gets rounded correctly.

Also add a test that checks one such case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 009ee1c9c5 FPU: Renormalize frsp operand if denormalized
This arranges for the frsp operand to be renormalized if necessary.
Without this, we can incorrectly get X set to 1 for denormalized
operands, and hence the rounding may be done incorrectly.  To make
things clearer, we now have an explicit flag indicating when the B
operand needs to be in normalized form.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras baf8f5f8c6 FPU: Force reserved FPSCR bit 11 to zero
This ensures that the reserved FPSCR bit can never be set, by clearing
it at the end of the fpu_1 process.

Also remove a redundant setting of cr_result in the mcrfs code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras a18c462b27 FPU: Ignore stale P contents in short-circuit multiply-add
When a multiply-add is done with A or C equal to zero, the actual
multiplication operation is not done, hence P is not valid, so in
FINISH state we shouldn't set X based on P being non-zero.  Fix this
by clearing the is_multiply flag in the short-circuit case.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 41988e3b5f FPU: Fix comparison of remainder in square root code
The square root procedure needs to compare B - R^2 with 2R + 1 to
decide whether to increment the square root estimate R by 1.  It
currently does this by putting 2R + 1 in B and using the pcmpb_lt
and pcmpb_eq signals.  This is not correct because the comparisons
that generate those signals have a 2-bit shift embedded into them.
Instead, put 2R + 1 into C and use pcmpc_lt/eq, which don't have
the 2-bit shift.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras f3b9566ae2 FPU: Round to single precision for fcfid[u]s
The fcfids and fcfidus instructions weren't rounding to single
precision because r.longmask wasn't getting set.  To fix this, set
v.longmask to e_in.single for the fcfid* instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras e5651e2eab FPU: Avoid adding bias twice in UE=1 underflow case
In case of underflow with UE=1, ROUND_UFLOW state adds the exponent
bias and then goes to NORMALIZE state if the value is not normalized.
Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is
still tiny, resulting in the bias getting added twice.  To avoid this,
if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW
state which does the normalization and goes to ROUNDING state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras a0755935f4 FPU: Normalize B for fmadd family instructions
If B is denormalized, but the A*C product is much smaller, then the
result is B; in the UE=1 case we need to normalize the result, and the
left shift to do that can bring in low-order product bits from S and
corrupt the result.  To avoid this, make sure B is normalized.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 32919435a3 FPU: Allow mtfsb* to set FPSCR[FX] implicitly
If mtfsb1 causes an individual exception bit to go from 0 to 1, that
should set FX as well.  Arrange for this by setting update_fx to 1.
Also make sure mcrfs doesn't copy the reserved FPSCR bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras e471581222 FPU: Do result processing on denorm short-circuit results when FPSCR[UE] is set
Results that are tiny (i.e., in the denorm range) need special
processing when underflow exceptions are enabled, including in the
cases where the result is just one of the input operands, such as for
a fmadd with A or C equal to zero.  To make sure this gets done, go to
FINISH state rather than returning the relevant input operand as the
result.  The same logic is now used when the result needs to be rounded
to single precision.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 0478fe41dd FPU: Reset FPSCR[FR,FI] at beginning of fcfid*
Otherwise a non-zero setting from a previous instruction won't get cleared.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras f252dba43d FPU: Only apply zero subtraction result sign rule when result is exactly zero
The rule in the ISA about the sign of the result of a subtraction when
the magnitude of the result is zero only applies when the operands are
equal in magnitude but opposite in sign, i.e. when the result is exactly
zero.  Add a check using FPSCR[FI] to exclude the cases where the exact
result is non-zero but gets truncated to zero by rounding.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras 8a204f1058 FPU: Set FPSCR exception summary based on individual invalid exception bits
Rather than setting FPSCR[FX] to 1 when FPSCR[VX] transitions from 0 to 1,
this sets it when any of the individual invalid exception bits (VSXNAN,
VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) transitions from
0 to 1.  This better matches the ISA and P9 behaviour.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago
Paul Mackerras fb71f62b83 FPU: Round finite special-case results to single precision if required
When a special case is detected, such as a zero operand to an add,
and the operation is a single-precision operation such as fadds,
we need to round the result to single precision instead of just
returning the relevant input operand unmodified.  This accomplishes
that by going to DO_FRSP_2 state from the special-case code for
single-precision operations that return a finite floating-point
result.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 months ago