Commit Graph

91 Commits (f3b9566ae2f2268be31dfb0f684ed977e05252bc)

Author SHA1 Message Date
Paul Mackerras f3b9566ae2 FPU: Round to single precision for fcfid[u]s
The fcfids and fcfidus instructions weren't rounding to single
precision because r.longmask wasn't getting set.  To fix this, set
v.longmask to e_in.single for the fcfid* instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras e5651e2eab FPU: Avoid adding bias twice in UE=1 underflow case
In case of underflow with UE=1, ROUND_UFLOW state adds the exponent
bias and then goes to NORMALIZE state if the value is not normalized.
Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is
still tiny, resulting in the bias getting added twice.  To avoid this,
if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW
state which does the normalization and goes to ROUNDING state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras a0755935f4 FPU: Normalize B for fmadd family instructions
If B is denormalized, but the A*C product is much smaller, then the
result is B; in the UE=1 case we need to normalize the result, and the
left shift to do that can bring in low-order product bits from S and
corrupt the result.  To avoid this, make sure B is normalized.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 32919435a3 FPU: Allow mtfsb* to set FPSCR[FX] implicitly
If mtfsb1 causes an individual exception bit to go from 0 to 1, that
should set FX as well.  Arrange for this by setting update_fx to 1.
Also make sure mcrfs doesn't copy the reserved FPSCR bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras e471581222 FPU: Do result processing on denorm short-circuit results when FPSCR[UE] is set
Results that are tiny (i.e., in the denorm range) need special
processing when underflow exceptions are enabled, including in the
cases where the result is just one of the input operands, such as for
a fmadd with A or C equal to zero.  To make sure this gets done, go to
FINISH state rather than returning the relevant input operand as the
result.  The same logic is now used when the result needs to be rounded
to single precision.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 0478fe41dd FPU: Reset FPSCR[FR,FI] at beginning of fcfid*
Otherwise a non-zero setting from a previous instruction won't get cleared.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras f252dba43d FPU: Only apply zero subtraction result sign rule when result is exactly zero
The rule in the ISA about the sign of the result of a subtraction when
the magnitude of the result is zero only applies when the operands are
equal in magnitude but opposite in sign, i.e. when the result is exactly
zero.  Add a check using FPSCR[FI] to exclude the cases where the exact
result is non-zero but gets truncated to zero by rounding.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 8a204f1058 FPU: Set FPSCR exception summary based on individual invalid exception bits
Rather than setting FPSCR[FX] to 1 when FPSCR[VX] transitions from 0 to 1,
this sets it when any of the individual invalid exception bits (VSXNAN,
VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) transitions from
0 to 1.  This better matches the ISA and P9 behaviour.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras fb71f62b83 FPU: Round finite special-case results to single precision if required
When a special case is detected, such as a zero operand to an add,
and the operation is a single-precision operation such as fadds,
we need to round the result to single precision instead of just
returning the relevant input operand unmodified.  This accomplishes
that by going to DO_FRSP_2 state from the special-case code for
single-precision operations that return a finite floating-point
result.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras de71a6119c FPU: Make FPSCR bit 11 always read as 0
Bit 11 (52 in BE numbering) is a reserved bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras ca792f3b13 FPU: Make convert-to-integer-word instructions behave like P9
The fctiw* instructions return a copy of the value in bits 31..0 in
bits 63..32 of the result on P9, rather than a sign or zero extension
of the word result.  Make the FPU do the same.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 82825a11ba FPU: Set result sign correctly for denorm +/- 0 case
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 37b1afc7f7 FPU: Make fri* instructions set FPSCR[FR,FI] to zero
As required by the ISA.  Also, never generate an inexact exception.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras dcd85164c6 FPU: Make fsel not alter FPSCR
fsel is a move-type instruction, and hence shouldn't affect FPSCR.
Set v.writing_fpr and v.instr_done, rather than setting arith_done,
to achieve this.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 066e38b8ea FPU: Do proper over/underflow handling for single-precision [fm]add
The ADD_3 state incorporated some of the logic of the FINISH state, but
in some cases assumed the result couldn't overflow or underflow - which
is not true for single precision operations, if the input operands are
outside the single precision range.  Fix this, and simplify things, by
having ADD_3 always go to FINISH state, which does the full overflow and
underflow checking.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras d540171f60 FPU: Ignore Rc bit for mffs* variants other than plain mffs
Bit 0 of the instruction is Rc for mffs but reserved for the other
mffs* instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 0e11f80f2f FPU: Set FPSCR[FPRF] to zero for convert to integer operations
This seems to be what P9 does.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 2f29daab2d FPU: Fix setting of r.x for single-precision operations
The fp_rounding function expects r.x to have been set based on the lower
31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1
(SP_RBIT is 31).  Also add a test to check.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras ab3783b61b FPU: Fix setting of r.x
Having computed rormr, use it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras e60840eabc FPU: Make sure FR and FI in FPSCR get reset on special-case arith instructions
Arithmetic instructions where the result is determined without doing any
actual computation (i.e. the input(s) are NaNs, infinities, zeroes etc.)
weren't resetting FR and FI properly.  This combines the two blocks that
handle the r.cycle_1_ar = 1 case to fix it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 week ago
Paul Mackerras 4282d37741 FPU: Faster method for testing for 1-bits at right end of R
At various points we need to set the X bit if any bit of R which would
be shifted out by a right shift of N bits is a 1.  We can do this by
computing R | -R, which contains a 1 in the position of the right-most
1-bit in R and in all positions to the left, and zeroes to the right.
That means we can test for the least-significant N bits being non-zero
by testing whether bit N-1 of (R | -R) is a 1.  Doing this uses fewer
LUTs and has better timing than the old method of generating a mask,
ANDing it with R, and testing whether the result is non-zero.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
8 months ago
Paul Mackerras 3268ef717c FPU: Make opsel_a a function of just the state
This adds some extra states and transitions so that opsel_a becomes
a function only of the current state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 73505b1626 FPU: Provide a separate path for transferring A/B/C to R
The timing path from r.a.class to result showed up as a critical path
on the Artix-7, apparently because of transfers of A, B or C to R in
special cases (e.g. NaN inputs) and the fsel instruction.  To
alleviate this, we provide a path via the miscellaneous value
multiplexer from A, B and C to R, selected via opsel_R = RES_MISC and
misc_sel = 111.  A new selector opsel_sel selects which of A, B or C
to transfer, using the same encoding as opsel_a.  This new selector is
now also used for the result class when rcls_op = RCLS_SEL and for the
result sign when rsgn_op = RSGN_SEL.  This reduces the number of
things that opsel_a depends on and eases timing in the main adder
path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras b63773f6e9 FPU: Move computation of main adder inputs out of the state machine
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras b4aae8511d FPU: Move special case handling to a separate process
This creates a new fpu_specialcases process that handles most of the
logic that was previously in the DO_NAN_INF and DO_ZERO_DEN states.
What remains of those states, i.e. the handling of denormalized
inputs, is in a new DO_SPECIAL state.  The state machine goes into
DO_SPECIAL state after IDLE for any arithmetic operation where an
input is a NaN, infinity, zero or denormalized value.  Doing this
means that the rest of the state machine won't try to start any
computation which would need to be overridden by the logic to produce
the result value selected by the fpu_specialcases process.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras b1bd2aa865 FPU: Make set_r independent of multiply_to_f.valid
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras fcfdbc449c FPU: Move condition register calculations to an explicit data path
Instead of calculating v.cr_result in the state machine, we now have
the state machine set a 'cr_op' variable which then controls what
computation the CR data path does to set v.cr_result.  The CR data
path also handles updating the XERC result bits for integer operations
(division and modulus).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras bbc485f336 FPU: Rework inputs to the main adder
With this, the A input no longer has R as an option but now takes the
rounding constants and the low-order bits of P (used as an adjustment
in the square root algorithm).  The B input has either R or zero.
Both inputs can be optionally inverted for subtraction.  The select
inputs to the multiplexers now have 3 bits in opsel_a and 1 bit in
opsel_b.

The states which need R to be set now explicitly have set_r := 1 even
though that is the default, essentially for documentation reasons.
Similarly some states set opsel_b <= BIN_R even though that is the
default.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 0e7c11a0e4 FPU: Move result_class logic outside of state machine
The various states choose one of four operations (including no-op) to
be done on result_class.  Some operations have side-effects on
arith_done or FPSCR.  The DO_NAN_INF and DO_ZERO_DEN states still set
result_class directly since their logic is expected to move out to a
separate process later.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 5f0b2d433d FPU: Simplify calculation of result_class
For the various arithmetic operators, we only get to the DO_* states
when the inputs are finite (not zero, infinity or NaN), so we can
replace setting of v.result_class to r.a.class or r.b.class with a
overall setting of it to FINITE in cycle 1 of all those operations.

Also, integer division doesn't need to set the result class since the
result is integer.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 70819c4c39 FPU: Do renormalization from DO_ZERO_DEN state
Instead of having the various DO_* states (DO_FMUL, DO_FDIV, etc.)
handle checking for denormalized inputs, we now have DO_ZERO_DEN state
check for denormalized inputs and branch to RENORM_{A,B,C} to handle
them.

This also meant some changes were needed in how fsqrt and frsqrte
handled inputs with odd exponent.  The DO_FSQRT and DO_FRSQRTE states
were very similar and have been combined into one.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 8648ddb64f FPU: Eliminate EXC_RESULT state
This lets us remove r.opsel_a and is a step towards moving the
handling of exceptional cases out to a separate process.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 850b87c83f FPU: Get rid of r.madd_cmp and r.exp_cmp
This saves a few LUTs and simplifies the code a little.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras ba2add029a FPU: Remove need to set opsel_a one cycle ahead
Most states set opsel_a directly to select the operand for the A input
of the main adder.  The exception is the EXC_RESULT state, which uses
r.opsel_a set by the previous cycle to indicate which input operand to
use as the result.

In order to make timing, ensure that the controls that select the
inputs to the main adder (opsel_*, etc.) don't depend on any
complicated functions of the data (such as px_nz, pcmpb_eq, pcmpb_lt,
etc.), but are as far as possible constant for each state.  There is
now a control called set_r for whether the result is written to r.r,
which enables us to avoid setting opsel_b or opsel_r conditionally in
some cases.

Also, to avoid a data-dependent setting of msel_2 in IDIV_DODIV state,
the IDIV_NR1 and IDIV_NR2 states have been reworked so that completion
of the required number of iterations is checked in IDIV_NR1 state, and
at that point, if the inverse estimate is < 0.5, we go to IDIV_USE0_5
state in order to use 0.5 as the estimate.  This means that in the
normal case, the inverse estimate is already in Y when we get to
IDIV_DODIV state.  IDIV_USE0_5 has been reworked to put R (which will
contain 0.5) into Y as the inverse estimate.  That means that
IDIV_DODIV state doesn't have any data-dependent logic to put either P
or R into Y.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 2731384a4b FPU: Reduce misc_sel to 3 bits
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras cf866ce910 FPU: Simplify logic for setting r.x
Since r.x is mostly set from the value in r.r and only once from
anything else (r.b.mantissa), move the check to before the input
multiplexer for the main adder, so it works on r.r rather than
whatever is selected by r.opsel_a.

For the case in DO_FRSP where we have B selected by r.opsel_a, we add
a new state so that we now get B into R and then check the low bits of
R.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 4e5f856c55 FPU: Factor out some of the common elements of the DO_* states
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 2422585e14 FPU: Reduce use of r.insn inside the state machine
Instead use things derived from the instruction in the first cycle,
such as r.is_multiply, r.is_addition, etc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 7812a55b6c FPU: Reorganize NaN and infinity handling and improve arch compliance
The architecture specifies that an invalid operation exception for
signalling NaN (VXSNAN) can occur in the same instructions as an
invalid operation exception for infinity times zero (VXIMZ) in the
case of a multiply-add instruction where B is a signalling NaN, and
one of A and C is infinity and the other is zero.  This moves the
invalid operation tests around so as to handle this case correctly.
It also restructures the infinity and NaN cases to simplify the logic
a little.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras a3613d863b FPU: Simplify sign calculation in FP multiply-add instructions
By starting out with result_sign = +/- sign of B, we avoid the need to
flip the result sign in a few places.

This also simplifies DO_FMADD state a bit by having DO_ZERO_DEN go to
DO_FMUL state for floating multiply-add where B is zero.  (The
RENORM_A2 and RENORM_C2 states already do this.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 707dd619a0 FPU: Move NaN/infinity and zero/denorm handling out to separate states
This should simplify the DO_* states and hopefully be simpler overall.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 27b3e42353 FPU: Move result_sign computations from state machine to a data path
Instead of operating on result_sign directly, the state machine now
sets a control variable "rsgn_op" that then directs a tiny ALU to do
what's required.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 71b7df679b FPU: Calculate quieten_nan in first cycle
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 955fa561fb FPU: Move most result_sign computation out of state machine
This moves the computation of r.result_sign out of the various
states for most instructions.  Now the sign is mostly computed in the
first cycle (when e_in.valid is true).

The set of operations done on r.result_sign in the state machine are
now restricted to 5 (other than no change): invert, xor with
r.is_subtract, or set to the sign of A, B or C.

Similarly r.is_subtract and r.negate are computed in the first cycle
now.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras f4d28d1521 FPU: Fix ftdiv and ftsqrt instructions
With ftdiv, we weren't setting result_exp to B.exponent before
testing result_exp in state FTDIV_1; the fix is to transfer B.exponent
to result_exp in state DO_FTDIV.

With ftsqrt, we were setting bit 1 of the destination CR field to 0
always, due to a typo.

Also move a couple of statements around to try to get slightly simpler
logic.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 95595af08d FPU: Fix typo in expression for exp_huge
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 18911455c6
FPU: Fix fsel instruction to not alter FPSCR (#426)
The fsel instruction is not supposed to alter FPSCR, but it was
clearing FR and FI.  Fix this.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 51954671f3 FPU: Fix behaviour of fdiv with denormalized divisor
Renormalization of the divisor for fdiv[s] was adjusting the result
exponent in the wrong direction, making the result smaller in
magnitude than it should be by a power of 2.  Fix this by negating
r.shift in the RENORM_B2 state and then subtracting it in the LOOKUP
cycle.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras eecf1ca399 FPU: Fix setting of FPRF
The sign recorded in FPRF was sometimes wrong because we weren't doing
the modifications that were done in pack_dp when setting FPRF (FPSCR
field).  These modifications are: set sign for zero result of
subtraction based on rounding mode; negate result for fnmadd/sub;
but don't modify sign of NaNs.

Instead we now do these modifications in the main state machine code
and put the result in an 'rsign' variable that is used to set
v.res_sign, then r.res_sign is used in the next cycle both for setting
FPRF and in the pack_dp functions.  That simplifies pack_dp and lets
us get rid of r.res_negate, r.res_subtract and r.res_rmode.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras e02d8060ed Change the multiplier interface to support signed multipliers
This adds an 'is_signed' signal to MultiplyInputType to indicate
whether the data1 and data2 fields are to be interpreted as signed or
unsigned numbers.

The 'not_result' field is replaced by a 'subtract' field which
provides a more intuitive interface for requesting that the product be
subtracted from the addend rather than added, i.e. subtract = 1 gives
C - A * B, vs. subtract = 0 giving C + A * B.  (Previously the users
of the multipliers got the same effect by complementing the addend and
setting not_result = 1.)

The is_32bit field is removed because it is no longer used now that we
have a separate 32-bit multiplier.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago