Otherwise, if this is a multiply-add instruction and the result needs
to be shifted left, bits of the product in S will contaminate the
final result.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
If the addend is smaller than the product and thus needs to be shifted
right, record if any bits are lost from the right end in r.x, so that
the result gets rounded correctly.
Also add a test that checks one such case.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This arranges for the frsp operand to be renormalized if necessary.
Without this, we can incorrectly get X set to 1 for denormalized
operands, and hence the rounding may be done incorrectly. To make
things clearer, we now have an explicit flag indicating when the B
operand needs to be in normalized form.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This ensures that the reserved FPSCR bit can never be set, by clearing
it at the end of the fpu_1 process.
Also remove a redundant setting of cr_result in the mcrfs code.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When a multiply-add is done with A or C equal to zero, the actual
multiplication operation is not done, hence P is not valid, so in
FINISH state we shouldn't set X based on P being non-zero. Fix this
by clearing the is_multiply flag in the short-circuit case.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The square root procedure needs to compare B - R^2 with 2R + 1 to
decide whether to increment the square root estimate R by 1. It
currently does this by putting 2R + 1 in B and using the pcmpb_lt
and pcmpb_eq signals. This is not correct because the comparisons
that generate those signals have a 2-bit shift embedded into them.
Instead, put 2R + 1 into C and use pcmpc_lt/eq, which don't have
the 2-bit shift.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The fcfids and fcfidus instructions weren't rounding to single
precision because r.longmask wasn't getting set. To fix this, set
v.longmask to e_in.single for the fcfid* instructions.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In case of underflow with UE=1, ROUND_UFLOW state adds the exponent
bias and then goes to NORMALIZE state if the value is not normalized.
Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is
still tiny, resulting in the bias getting added twice. To avoid this,
if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW
state which does the normalization and goes to ROUNDING state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
If B is denormalized, but the A*C product is much smaller, then the
result is B; in the UE=1 case we need to normalize the result, and the
left shift to do that can bring in low-order product bits from S and
corrupt the result. To avoid this, make sure B is normalized.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
If mtfsb1 causes an individual exception bit to go from 0 to 1, that
should set FX as well. Arrange for this by setting update_fx to 1.
Also make sure mcrfs doesn't copy the reserved FPSCR bit.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Results that are tiny (i.e., in the denorm range) need special
processing when underflow exceptions are enabled, including in the
cases where the result is just one of the input operands, such as for
a fmadd with A or C equal to zero. To make sure this gets done, go to
FINISH state rather than returning the relevant input operand as the
result. The same logic is now used when the result needs to be rounded
to single precision.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The rule in the ISA about the sign of the result of a subtraction when
the magnitude of the result is zero only applies when the operands are
equal in magnitude but opposite in sign, i.e. when the result is exactly
zero. Add a check using FPSCR[FI] to exclude the cases where the exact
result is non-zero but gets truncated to zero by rounding.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Rather than setting FPSCR[FX] to 1 when FPSCR[VX] transitions from 0 to 1,
this sets it when any of the individual invalid exception bits (VSXNAN,
VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) transitions from
0 to 1. This better matches the ISA and P9 behaviour.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When a special case is detected, such as a zero operand to an add,
and the operation is a single-precision operation such as fadds,
we need to round the result to single precision instead of just
returning the relevant input operand unmodified. This accomplishes
that by going to DO_FRSP_2 state from the special-case code for
single-precision operations that return a finite floating-point
result.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The fctiw* instructions return a copy of the value in bits 31..0 in
bits 63..32 of the result on P9, rather than a sign or zero extension
of the word result. Make the FPU do the same.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
fsel is a move-type instruction, and hence shouldn't affect FPSCR.
Set v.writing_fpr and v.instr_done, rather than setting arith_done,
to achieve this.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The ADD_3 state incorporated some of the logic of the FINISH state, but
in some cases assumed the result couldn't overflow or underflow - which
is not true for single precision operations, if the input operands are
outside the single precision range. Fix this, and simplify things, by
having ADD_3 always go to FINISH state, which does the full overflow and
underflow checking.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The fp_rounding function expects r.x to have been set based on the lower
31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1
(SP_RBIT is 31). Also add a test to check.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Arithmetic instructions where the result is determined without doing any
actual computation (i.e. the input(s) are NaNs, infinities, zeroes etc.)
weren't resetting FR and FI properly. This combines the two blocks that
handle the r.cycle_1_ar = 1 case to fix it.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The result byte needs to be zero when the index byte value is >= 64.
Fixes: 23ff954059 ("core: Change bperm to a simpler and slower implementation", 2025-01-07)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
LPCR[HEIC] should only disable external interrupts in hypervisor mode,
and not in problem state (user mode). This fixes the expression for
irq_valid to do that.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This aims to simplify the logic in the execute2_1 process. It is not
really necessary to preserve the contents of ex2 when stalled, except
for ex2.e.last_nia; but when stalled, bits which would initiate
downstream actions, such as ex2.e.valid, ex2.e.interrupt and ex2.se,
should be cleared.
Also, the path through stage2_stall to the bypass valid signal has
shown up as a critical path. This dependency is there because the
mfspr instruction to a slow SPR or a PMU SPR should not forward a
result before the instruction is about to complete, because the result
might change (for example when reading the timebase). To avoid this
dependency, we simply don't forward results for mfspr to slow/PMU
SPRs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The ISA specifies that mfspr or mtspr to SPR 0, 4, 5 or 6 should
generate a hypervisor emulation assistance interrupt in privileged
mode, so this adds logic to do that.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This implements the EVIRT bit in the LPCR register. When set to 1,
EVIRT causes mfspr and mtspr to an undefined SPR number in privileged
mode (i.e. hypervisor mode) to cause a hypervisor emulation assistance
interrupt. When set to 0, such instructions are executed as no-ops.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When mfspr is performed to one of the reserved no-op SPRs, or to an
undefined SPR in privileged state, the behaviour is a no-op, that is,
the destination register is not written. Previously this was done by
writing back the same value that the register had before the
instruction, but in fact it can be done simply by negating the write
enable signal so that the result GPR is not written. This gives a
small reduction in logic complexity.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In order to improve timing, the bypass paths now carry the register
number being written as well as the tag. The decisions about which
bypasses to use for which operands are then made by comparing the
register numbers rather than by determining a tag from the register
number and then comparing tags.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When an arithmetic instruction generates an invalid operation
exception or a divide by zero exception, and that exception is enabled
in the FPSCR, the writing of the result to the destination register
should be suppressed, leaving whatever value was last written in the
destination. Add a check that this occurs correctly, for the cases of
square root of a negative number (invalid operation exception) and
division by zero (zero divide exception).
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
If we have two successive instructions that write the same result
register and then a third that uses the same register as an input, and
the second instruction suppresses the write of its result, we can
currently end up with the third instruction using the wrong value,
because it uses the register value from before the first instruction
rather than the result of the first instruction. (An example of an
instruction suppressing the write of its result is a floating-point
instruction that generates an enabled invalid operation exception but
not an interrupt.)
To fix this, the control module now uses any forwarded value for the
register we want, not just the most recent value, but still stalls
until it has the most recent value, or the previous instruction
completes. Thus in the case described above, decode2 will have
latched the value from the first instruction and so the third
instruction gets the correct value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Code in the execute1_actions process that handles illegal and facility
unavailable interrupts was setting actions.se.set_heir or
actions.se.set_ic, but then because actions.exception was also set,
the contents of actions.se were ignored, meaning that HEIR or FSCR[IC]
were not getting updated. To fix this, execute1_1 now conditions use
of those fields on valid_in rather than go.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In privileged mode, mfspr from an undefined or unimplemented SPR
number should be a no-op, which is implemented here by writing back
the same value that the destination register previously had. However,
we ended up writing back 0 because ex1.res2_sel was not set correctly.
To fix this, set res2_sel to 10 in the undefined SPR case.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In the fall of 2020, cmd/clk scan in liblitedram was changed in a way
that required reverting cmd_latency being set to 1 in LiteDRAM commit
4e62d28 back to 0. For the default in s7ddrphy.py this revert happened
in 496cd27, but for standalone gen the .yml was never updated in neither
LiteDRAM nor Microwatt, leading to regression:
https://github.com/antonblanchard/microwatt/issues/363
The present commit updates the .yml so DRAM works on Genesys2 again.
See also
https://github.com/enjoy-digital/litedram/pull/368
for a corresponding update to the .yml in LiteDRAM.
Signed-off-by: Boris Shingarov <shingarov@labware.com>
This implements the hypervisor doorbell exception and interrupt and
the msgsnd, msgclr and msgsync instructions (msgsync is a no-op). The
msgsnd instruction can generate a hypervisor doorbell interrupt on any
CPU in the system. To achieve this, each core sends its hypervisor
doorbell messages to the soc level, which ORs together the bits for
each CPU and sends it to that CPU.
The privileged doorbell exception/interrupt and the msgsndp/msgclrp
instructions are not required since we don't implement SMT.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This replaces OP_ADDG6S, OP_BCD, OP_BREV, OP_CMPB, OP_CMPEQB,
OP_CMPRB, OP_CROP, OP_EXTS, OP_EXTSWSLI, OP_ISEL, OP_LOGIC, OP_MFCR,
OP_PRTY, OP_RLC, OP_RLCL, OP_RLCR, OP_SETB, OP_SHL, OP_SHR,
and OP_XOR with a single OP_COMPUTE. The replaced operations are all
ones which just compute a result value (for GPR or CR) in execute1,
don't have any other side effects, and aren't used in decode2 to
determine other signals. The operation to be performed is
sufficiently defined by the result and subresult fields in the decode
table. With the elimination of OP_SPARE, this reduces the number of
insn_type_t values to 44.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>