This implements the byte-reverse halfword, word and doubleword
instructions: brh, brw, and brd. These instructions were added to the
ISA in version 3.1. They use a new OP_BREV insn_type value. The
logic for these instructions is implemented in logical.vhdl.
In order to avoid going over 64 insn_type values, OP_AND and OP_OR
were combined into OP_LOGIC, which is like OP_AND except that the RS
input can be inverted as well as the RB input. The various forms of
OR instruction are then implemented using the identity
a OR b = NOT (NOT a AND NOT b)
The 'is_signed' field of the instruction decode table is used to
indicate that RS should be inverted.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
With this, the register file now contains 64 entries, for 32 GPRs and
32 FPRs, rather than the 128 it had previously. Several things get
simplified - decode1 no longer has to work out the ispr{1,2,o} values,
decode_input_reg_{a,b,c} no longer have the t = SPR case, etc.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result. The motivation for this is that the popcnt*
calculation was showing up as a critical path.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
- mcrxrx put the bits in the wrong order
- addpcis was setting CR0 if the instruction bit 0 = 1, which it
shouldn't
- bpermd was producing 0 always and additionally had the wrong bit
numbering
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds an explicit multiplexer feeding v.e.write_data in execute1,
with the select lines determined in the previous cycle based on the
insn_type. Similarly, for multiply and divide instructions, there is
now an explicit multiplexer.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
To avoid adding too much logic, this moves the adder used by OP_ADD
out of the case statement in execute1.vhdl so that the result can
be used by OP_ADDG6S as well.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
It's not needed for the other ops (popcnt, parity, etc.) and the
logical unit shows up as a critical path from time to time.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This reduces the number of different things that are assigned to
the result variable.
- The computations for the popcnt, prty, cmpb and exts instruction
families are moved into the logical unit.
- The result of mfspr from the slow SPRs is computed in 'spr_val'
before being assigned to 'result'.
- Writes to LR as a result of a blr or bclr instruction are done
through the exc_write path to writeback.
This eases timing considerably.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This implements logic in the logical entity to calculate the results
of the popcnt* and prty* instructions. We now have one insn_type_t
value for the 3 popcnt variants and one for the two prty variants,
using the length field of the decode_rom_t to distinguish between
them. The implementations in logical.vhdl using recursive
algorithms rather than the simple functions in ppc_fx_insns.vhdl.
This gives a saving of about 140 slice LUTs on the A7-100 and
improves timing slightly.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common
invert on the input and output. This saves us about 200 LUTs.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>