Commit Graph

7 Commits (master)

Author SHA1 Message Date
Paul Mackerras e02d8060ed Change the multiplier interface to support signed multipliers
This adds an 'is_signed' signal to MultiplyInputType to indicate
whether the data1 and data2 fields are to be interpreted as signed or
unsigned numbers.

The 'not_result' field is replaced by a 'subtract' field which
provides a more intuitive interface for requesting that the product be
subtracted from the addend rather than added, i.e. subtract = 1 gives
C - A * B, vs. subtract = 0 giving C + A * B.  (Previously the users
of the multipliers got the same effect by complementing the addend and
setting not_result = 1.)

The is_32bit field is removed because it is no longer used now that we
have a separate 32-bit multiplier.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras af9fe3467c xilinx_mult: Prepare for doing signed multiplication
This rearranges the way that partial products are generated and summed
so that the partial products that could be negative in a signed
multiplier are now sign-extended.  The inputs are still zero-extended,
however.

The overflow detection logic now only detects 64-bit overflow, since
32-bit multiplications are handled in a separate multiplier.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 734e4c4a52 core: Add a short multiplier
This adds an optional 16 bit x 16 bit signed multiplier and uses it
for multiply instructions that return the low 64 bits of the product
(mull[dw][o] and mulli, but not maddld) when the operands are both in
the range -2^15 .. 2^15 - 1.   The "short" 16-bit multiplier produces
its result combinatorially, so a multiply that uses it executes in one
cycle.  This improves the coremark result by about 4%, since coremark
does quite a lot of multiplies and they almost all have operands that
fit into 16 bits.

The presence of the short multiplier is controlled by a generic at the
execute1, SOC, core and top levels.  For now, it defaults to off for
all platforms, and can be enabled using the --has_short_mult flag to
fusesoc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 0cdaa2778f xilinx-mult: Move some registers later in the data flow
This changes s0 to use the P register rather than the A/B/C input
registers, thus improving the timing of the multiplier output.  The
m00, m02 and m03 multipliers now use their P registers rather than the
M registers, moving the addition they do from the second cycle to the
first.

Also, the XOR that inverts the 32 LSBs is moved before the output
register.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras f1238299bd execute1: Take an extra cycle for OE=1 multiply instructions
We now expect the overflow signal from the multiplier to come along
one cycle later than the product.

This breaks up a long combinatorial path and improves timing.

This also changes some uses of v.<field> to r.<field> in the slow
op logic, which should help timing as well.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras 535341961d multiplier: Generalize interface to the multiplier
This makes the interface to the multiplier more general so an instance
of it can be used in the FPU.  It now has a 128-bit addend that is
added on to the product.  Instead of an input to negate the output,
it now has a "not_result" input to complement the output.  Execute1
uses not_result=1 and addend=-1 to get the effect of negating the
output.  The interface is defined this way because this is what can
be done easily with the Xilinx DSP slices in xilinx-mult.vhdl.

This also adds clock enable signals to the DSP slices, mostly for the
sake of reducing power consumption.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras 0809bc898b multiply: Use DSP48 slices for multiplication on Xilinx FPGAs
This adds a custom implementation of the multiplier which uses 16
DSP48E1 slices to do a 64x64 bit multiplication in 2 cycles.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago