This makes the interface to the multiplier more general so an instance
of it can be used in the FPU. It now has a 128-bit addend that is
added on to the product. Instead of an input to negate the output,
it now has a "not_result" input to complement the output. Execute1
uses not_result=1 and addend=-1 to get the effect of negating the
output. The interface is defined this way because this is what can
be done easily with the Xilinx DSP slices in xilinx-mult.vhdl.
This also adds clock enable signals to the DSP slices, mostly for the
sake of reducing power consumption.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This puts the logic that selects which bits of the multiplier result
get written into the destination GPR into execute1, moved out from
multiply.
The multiplier is now expected to do an unsigned multiplication of
64-bit operands, optionally negate the result, detect 32-bit
or 64-bit signed overflow of the result, and return a full 128-bit
result.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
For multiply and divide operations, execute1 now records the
destination GPR number, RC and OE from the instruction, and the
XER value. This means that the multiply and divide units don't
need to record those values and then send them back to execute1.
This makes the interface to those units a bit simpler. They
simply report an overflow signal along with the result value, and
execute1 takes care of updating XER if necessary.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
With this, the multiplier isn't a separate pipe that decode2 issues
instructions to, but rather is a unit that execute1 sends operands
to and which sends the result back to execute1, which then sends it
to writeback. Execute1 now sends a stall signal when it gets a
multiply instruction until it gets a valid signal back from the
multiplier.
This all means that we no longer need to mark the multiply
instructions as single-issue.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds code to writeback to format data and test the result
against zero for the purpose of setting CR0. The data formatter
is able to shift and mask by bytes and do byte reversal and sign
extension. It can also put together bytes from two input
doublewords to support unaligned loads (including unaligned
byte-reversed loads).
The data formatter starts with an 8:1 multiplexer that is able
to direct any byte of the input to any byte of the output. This
lets us rotate the data and simultaneously byte-reverse it.
The rotated/reversed data goes to a register for the unaligned
cases that overlap two doublewords. Then there is per-byte logic
that does trimming, sign extension, and splicing together bytes
from a previous input doubleword (stored in data_latched) and the
current doubleword. Finally the 64-bit result is tested to set
CR0 if rc = 1.
This removes the RC logic from the execute2, multiply and divide
units, and the shift/mask/byte-reverse/sign-extend logic from
loadstore2.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>