For multiply and divide operations, execute1 now records the
destination GPR number, RC and OE from the instruction, and the
XER value. This means that the multiply and divide units don't
need to record those values and then send them back to execute1.
This makes the interface to those units a bit simpler. They
simply report an overflow signal along with the result value, and
execute1 takes care of updating XER if necessary.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
With this, the multiplier isn't a separate pipe that decode2 issues
instructions to, but rather is a unit that execute1 sends operands
to and which sends the result back to execute1, which then sends it
to writeback. Execute1 now sends a stall signal when it gets a
multiply instruction until it gets a valid signal back from the
multiplier.
This all means that we no longer need to mark the multiply
instructions as single-issue.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The carry is currently internal to execute1. We don't handle any of
the other XER fields.
This creates type called "xer_common_t" that contains the commonly
used XER bits (CA, CA32, SO, OV, OV32).
The value is stored in the CR file (though it could be a separate
module). The rest of the bits will be implemented as a separate
SPR and the two parts reconciled in mfspr/mtspr in latter commits.
We always read XER in decode2 (there is little point not to)
and send it down all pipeline branches as it will be needed in
writeback for all type of instructions when CR0:SO needs to be
updated (such forms exist for all pipeline branches even if we don't
yet implement them).
To avoid having to track XER hazards, we forward it back in EX1. This
assumes that other pipeline branches that can modify it (mult and div)
are running single issue for now.
One additional hazard to beware of is an XER:SO modifying instruction
in EX1 followed immediately by a store conditional. Due to our writeback
latency, the store will go down the LSU with the previous XER value,
thus the stcx. will set CR0:SO using an obsolete SO value.
I doubt there exist any code relying on this behaviour being correct
but we should account for it regardless, possibly by ensuring that
stcx. remain single issue initially, or later by adding some minimal
tracking or moving the LSU into the same pipeline as execute.
Missing some obscure XER affecting instructions like addex or mcrxrx.
[paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of
arguments to set_ov]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds code to writeback to format data and test the result
against zero for the purpose of setting CR0. The data formatter
is able to shift and mask by bytes and do byte reversal and sign
extension. It can also put together bytes from two input
doublewords to support unaligned loads (including unaligned
byte-reversed loads).
The data formatter starts with an 8:1 multiplexer that is able
to direct any byte of the input to any byte of the output. This
lets us rotate the data and simultaneously byte-reverse it.
The rotated/reversed data goes to a register for the unaligned
cases that overlap two doublewords. Then there is per-byte logic
that does trimming, sign extension, and splicing together bytes
from a previous input doubleword (stored in data_latched) and the
current doubleword. Finally the 64-bit result is tested to set
CR0 if rc = 1.
This removes the RC logic from the execute2, multiply and divide
units, and the shift/mask/byte-reverse/sign-extend logic from
loadstore2.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This seems dependent on the FPGA type/size, so we should probably
make it a toplevel generic, but for now this helps on the
Arty A7-35
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We no longer gate multiply with the valid signal, so it's complaining
a lot. Comment out the warning.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>