This adds some extra states and transitions so that opsel_a becomes
a function only of the current state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The timing path from r.a.class to result showed up as a critical path
on the Artix-7, apparently because of transfers of A, B or C to R in
special cases (e.g. NaN inputs) and the fsel instruction. To
alleviate this, we provide a path via the miscellaneous value
multiplexer from A, B and C to R, selected via opsel_R = RES_MISC and
misc_sel = 111. A new selector opsel_sel selects which of A, B or C
to transfer, using the same encoding as opsel_a. This new selector is
now also used for the result class when rcls_op = RCLS_SEL and for the
result sign when rsgn_op = RSGN_SEL. This reduces the number of
things that opsel_a depends on and eases timing in the main adder
path.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This creates a new fpu_specialcases process that handles most of the
logic that was previously in the DO_NAN_INF and DO_ZERO_DEN states.
What remains of those states, i.e. the handling of denormalized
inputs, is in a new DO_SPECIAL state. The state machine goes into
DO_SPECIAL state after IDLE for any arithmetic operation where an
input is a NaN, infinity, zero or denormalized value. Doing this
means that the rest of the state machine won't try to start any
computation which would need to be overridden by the logic to produce
the result value selected by the fpu_specialcases process.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of calculating v.cr_result in the state machine, we now have
the state machine set a 'cr_op' variable which then controls what
computation the CR data path does to set v.cr_result. The CR data
path also handles updating the XERC result bits for integer operations
(division and modulus).
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
With this, the A input no longer has R as an option but now takes the
rounding constants and the low-order bits of P (used as an adjustment
in the square root algorithm). The B input has either R or zero.
Both inputs can be optionally inverted for subtraction. The select
inputs to the multiplexers now have 3 bits in opsel_a and 1 bit in
opsel_b.
The states which need R to be set now explicitly have set_r := 1 even
though that is the default, essentially for documentation reasons.
Similarly some states set opsel_b <= BIN_R even though that is the
default.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The various states choose one of four operations (including no-op) to
be done on result_class. Some operations have side-effects on
arith_done or FPSCR. The DO_NAN_INF and DO_ZERO_DEN states still set
result_class directly since their logic is expected to move out to a
separate process later.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
For the various arithmetic operators, we only get to the DO_* states
when the inputs are finite (not zero, infinity or NaN), so we can
replace setting of v.result_class to r.a.class or r.b.class with a
overall setting of it to FINITE in cycle 1 of all those operations.
Also, integer division doesn't need to set the result class since the
result is integer.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of having the various DO_* states (DO_FMUL, DO_FDIV, etc.)
handle checking for denormalized inputs, we now have DO_ZERO_DEN state
check for denormalized inputs and branch to RENORM_{A,B,C} to handle
them.
This also meant some changes were needed in how fsqrt and frsqrte
handled inputs with odd exponent. The DO_FSQRT and DO_FRSQRTE states
were very similar and have been combined into one.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This lets us remove r.opsel_a and is a step towards moving the
handling of exceptional cases out to a separate process.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Most states set opsel_a directly to select the operand for the A input
of the main adder. The exception is the EXC_RESULT state, which uses
r.opsel_a set by the previous cycle to indicate which input operand to
use as the result.
In order to make timing, ensure that the controls that select the
inputs to the main adder (opsel_*, etc.) don't depend on any
complicated functions of the data (such as px_nz, pcmpb_eq, pcmpb_lt,
etc.), but are as far as possible constant for each state. There is
now a control called set_r for whether the result is written to r.r,
which enables us to avoid setting opsel_b or opsel_r conditionally in
some cases.
Also, to avoid a data-dependent setting of msel_2 in IDIV_DODIV state,
the IDIV_NR1 and IDIV_NR2 states have been reworked so that completion
of the required number of iterations is checked in IDIV_NR1 state, and
at that point, if the inverse estimate is < 0.5, we go to IDIV_USE0_5
state in order to use 0.5 as the estimate. This means that in the
normal case, the inverse estimate is already in Y when we get to
IDIV_DODIV state. IDIV_USE0_5 has been reworked to put R (which will
contain 0.5) into Y as the inverse estimate. That means that
IDIV_DODIV state doesn't have any data-dependent logic to put either P
or R into Y.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Since r.x is mostly set from the value in r.r and only once from
anything else (r.b.mantissa), move the check to before the input
multiplexer for the main adder, so it works on r.r rather than
whatever is selected by r.opsel_a.
For the case in DO_FRSP where we have B selected by r.opsel_a, we add
a new state so that we now get B into R and then check the low bits of
R.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead use things derived from the instruction in the first cycle,
such as r.is_multiply, r.is_addition, etc.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The architecture specifies that an invalid operation exception for
signalling NaN (VXSNAN) can occur in the same instructions as an
invalid operation exception for infinity times zero (VXIMZ) in the
case of a multiply-add instruction where B is a signalling NaN, and
one of A and C is infinity and the other is zero. This moves the
invalid operation tests around so as to handle this case correctly.
It also restructures the infinity and NaN cases to simplify the logic
a little.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
By starting out with result_sign = +/- sign of B, we avoid the need to
flip the result sign in a few places.
This also simplifies DO_FMADD state a bit by having DO_ZERO_DEN go to
DO_FMUL state for floating multiply-add where B is zero. (The
RENORM_A2 and RENORM_C2 states already do this.)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of operating on result_sign directly, the state machine now
sets a control variable "rsgn_op" that then directs a tiny ALU to do
what's required.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This moves the computation of r.result_sign out of the various
states for most instructions. Now the sign is mostly computed in the
first cycle (when e_in.valid is true).
The set of operations done on r.result_sign in the state machine are
now restricted to 5 (other than no change): invert, xor with
r.is_subtract, or set to the sign of A, B or C.
Similarly r.is_subtract and r.negate are computed in the first cycle
now.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of a single global timebase register in the SoC, we now have
a timebase counter in each core; however, now they are only reset by
the soc reset, not the core reset. Thus they stay in sync even when
some cores are disabled (via the syscon cpu_ctrl register).
This implements mtspr to the TBLW and TBUW SPRs, which write the lower
and upper 32 bits of this core's timebase, respectively.
In order to fulfil the ISA's requirements that (a) some method for
getting the timebases into sync and (b) some method for preventing
userspace from reading the timebase be provided by the platform, this
adds a syscon register TB_CTRL with two read/write bits implemented;
bit 0 freezes all the timebases in the system when set, and bit 1
makes reading the timebase privileged (in all cores).
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
SPR numbers 808 - 811 do nothing when read or written, that is, mfspr
doesn't modify the destination register. This is accomplished in the
same way that privileged mfspr to an unimplemented SPR is made a
no-op, by supplying the old contents of the destination register as an
input and writing that same value back.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Of the defined aspect bits (which are all read-write), only the NPHIE
and PHIE bits have any function at all, since Microwatt is an in-order
single-issue machine and never does any branch speculation. Also,
since there is no privileged non-hypervisor mode, the high 32 bits of
DEXCR do nothing.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This moves HASHKEYR and HASHPKEYR to the SPR RAM that also stores
things such as SRR0/1, LR and CTR. For hashst[p] and hashchk[p]
instructions, execute1 reads the relevant key register from the RAM
and sends it to loadstore1. This saves several LUTs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of a single input_reg_b_t field in the decode table which
select both whether input B is a register or constant, and also which
constant (immediate value) to use, we now have one field which selects
whether input B is immediate (constant), a GPR, or an FPR, and a
separate field to select which sort of immediate value to use. This
results in simpler logic and better timing.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
These provide facilities similar to hashstp, hashchk and HASHKEYR, but
restricted to privileged mode.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Previously the computation of whether an instruction is privileged or
not was done based on the insn_type. However, that meant that l*cix
(OP_LOAD) and st*cix (OP_STORE) couldn't be made privileged, and
neither could tlbsync (OP_NOP).
Instead, this adds a field to the main instruction decode table to
indicate privileged instructions, and makes the cache-inhibited loads
and stores privileged, along with tlbsync.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
These are done in loadstore1. The HashDigest function is computed in
9 cycles; for 8 cycles, a state machine does 4 steps of key expansion
per cycle, and for each of 4 lanes of data, does 4 steps of ciphering;
then there is 1 cycle to combine the results into the final hash
value.
At present, hashcmp does not overlap the computation of the hash with
fetching of data from memory (in the case of a cache miss).
The 'is_signed' field in the instruction decode table is used to
distinguish hashst and hashcmp from ordinary loads and stores. We
have a new 'RBC' value for input_reg_c_t which says that we are
reading RB but we want the value to come in via the C port; this is
because we want the 5-bit immediate offset on the B port.
Note that in the list of insn_code values, hashst/chk have been put in
the section for instructions with an RB operand, which is not strictly
correct given that the B port is used for the immediate D operand;
however, adding them to the section for instructions without an RB
operand would have made that section exceed 128 entries, causing
changes to the padding needed. The only downside to having hashst/cmp
where they are is that the debug logic can't use the RB port to read
GPR/FPRs when a hashst/cmp instruction is being decoded.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of doing the address subtractions and subsequent logic for
DAWR hit detection in the second cycle of a load or store, this does
the subtractions in the first cycle and the remaining logic in the
second cycle. This improves timing.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
For the sake of overall timing in larger SoCs, remove the early_sel
optimization when there are more than 4 masters.
Also make the ack and stall signals to a particular master depend on
that master's cyc, not on the busy signal, which can depend on any
master's cyc.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This implements the server field in the XISRs (external interrupt
source registers), allowing each interrupt source to be directed to a
particular CPU. If the CPU number that is written is out of range,
CPU 0 is used.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>