This adds a divider unit, connected to the core in much the same way
that the multiplier unit is connected. The division algorithm is
very simple-minded, taking 64 clock cycles for any division (even
32-bit division instructions).
The decoding is simplified by making use of regularities in the
instruction encoding for div* and mod* instructions. Instead of
having PPC_* encodings from the first-stage decoder for each of the
different div* and mod* instructions, we now just have PPC_DIV and
PPC_MOD, and the inputs to the divider that indicate what sort of
division operation to do are derived from instruction word bits.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Right now we continually print all 3 possible GPRs an instruction
may be using. Add signals so we only print GPRs when they are
actually read. This should hopefully optimise away when synthesized.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Handle the CR as a single field with per nibble enables. Forward any
writes in the same cycle.
If this proves to be an issue for timing, we may want to revisit
this in the future. For now, it keeps things simple.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
The decode2 stage was spaghetti code and needed cleaning up.
Create a series of functions to pull fields from a ppc instruction
and also a series of helpers to extract values for the execution
units.
As suggested by Paul, we should pass all signals to the execution
units and only set the valid signal conditionally, which should
use less resources.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>