We now expect the overflow signal from the multiplier to come along
one cycle later than the product.
This breaks up a long combinatorial path and improves timing.
This also changes some uses of v.<field> to r.<field> in the slow
op logic, which should help timing as well.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This makes the interface to the multiplier more general so an instance
of it can be used in the FPU. It now has a 128-bit addend that is
added on to the product. Instead of an input to negate the output,
it now has a "not_result" input to complement the output. Execute1
uses not_result=1 and addend=-1 to get the effect of negating the
output. The interface is defined this way because this is what can
be done easily with the Xilinx DSP slices in xilinx-mult.vhdl.
This also adds clock enable signals to the DSP slices, mostly for the
sake of reducing power consumption.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a custom implementation of the multiplier which uses 16
DSP48E1 slices to do a 64x64 bit multiplication in 2 cycles.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>