This adds an explicit multiplexer feeding v.e.write_data in execute1,
with the select lines determined in the previous cycle based on the
insn_type. Similarly, for multiply and divide instructions, there is
now an explicit multiplexer.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
To avoid adding too much logic, this moves the adder used by OP_ADD
out of the case statement in execute1.vhdl so that the result can
be used by OP_ADDG6S as well.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
It's not needed for the other ops (popcnt, parity, etc.) and the
logical unit shows up as a critical path from time to time.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This reduces the number of different things that are assigned to
the result variable.
- The computations for the popcnt, prty, cmpb and exts instruction
families are moved into the logical unit.
- The result of mfspr from the slow SPRs is computed in 'spr_val'
before being assigned to 'result'.
- Writes to LR as a result of a blr or bclr instruction are done
through the exc_write path to writeback.
This eases timing considerably.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This implements logic in the logical entity to calculate the results
of the popcnt* and prty* instructions. We now have one insn_type_t
value for the 3 popcnt variants and one for the two prty variants,
using the length field of the decode_rom_t to distinguish between
them. The implementations in logical.vhdl using recursive
algorithms rather than the simple functions in ppc_fx_insns.vhdl.
This gives a saving of about 140 slice LUTs on the A7-100 and
improves timing slightly.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common
invert on the input and output. This saves us about 200 LUTs.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>