Commit Graph

128 Commits (55b6f8be5237d1360c8e3ba442334b20e7d77420)

Author SHA1 Message Date
Paul Mackerras 9646fe28b0 Do sign-extension instructions in writeback instead of execute1
This makes the exts[bhw] instructions do the sign extension in the
writeback stage using the sign-extension logic there instead of
having unique sign extension logic in execute1.  This requires
passing the data length and sign extend flag from decode2 down
through execute1 and execute2 and into writeback.  As a side bonus
we reduce the number of values in insn_type_t by two.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 86c53aa3f7 Implement neg using OP_ADD
We have all the machinery in place to implement the neg instruction
as OP_ADD.  Doing that means we can ditch OP_NEG, and saves about
66 slice LUTs on the A7-100.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard 57b7f1ed71 Don't infer latch for newcrf
Always initialize newcrf to avoid inferring a latch.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Paul Mackerras 24a4a796ce execute: Consolidate count-leading/trailing-zeroes implementations
This adds combinatorial logic that does 32-bit and 64-bit count
leading and trailing zeroes in one unit, and consolidates the
four instructions under a single OP_CNTZ opcode.

This saves 84 slice LUTs on the Arty A7-100.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard b8fb721b81 Consolidate logical instructions
Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common
invert on the input and output. This saves us about 200 LUTs.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Paul Mackerras f7c393ba7e Add a rotate/mask/shift unit and use it in execute1
This adds a new entity 'rotator' which contains combinatorial logic
for rotating and masking 64-bit values.  It implements the operations
of the rlwinm, rlwnm, rlwimi, rldicl, rldicr, rldic, rldimi, rldcl,
rldcr, sld, slw, srd, srw, srad, sradi, sraw and srawi instructions.
It consists of a 3-stage 64-bit rotator using 4:1 multiplexors at
each stage, two mask generators, output logic and control logic.

The insn_type_t values used for these instructions have been reduced
to just 5: OP_RLC, OP_RLCL and OP_RLCR for the rotate and mask
instructions (clear both left and right, clear left, clear right
variants), OP_SHL for left shifts, and OP_SHR for right shifts.
The control signals for the rotator are derived from the opcode
and from the is_32bit and is_signed fields of the decode_rom_t.

The rotator is instantiated as an entity in execute1 so that we can
be sure we only have one of it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 7fe84220a5 decode: Avoid multiplexing from instruction reg fields to regfile address ports
This aims to simplify the logic between the instruction image and
the register file read address ports and reduce the size of the decode
tables.  With this patch, the input_reg_a column of the decode tables
can only select RA or zeroes, the input_reg_b column can only select
RB or a constant (0, -1, or an immediate value from the instruction),
and the input_reg_c columns can only select RS or zeroes.

That means that the rotate/shift/logical ops now have their first
input coming in via the input_reg_c column.  That means we need to
add a read_data3 field to the Decode2ToExecuteType record, but that
will go away again when we split out the rotate/mask/logical ops to
their own unit.

As a related but not tightly connected change, this patch also sets
the read1_enable signal to the register file be 0 when RA=0 and the
input_reg_a for the instruction is RA_OR_ZERO (previously it was 1).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 96b402a4bf Consolidate add/subtract instructions into a single op
All of the PPC add and subtract instructions, including carrying
and extended versions, do much the same arithmetic operation:

	result = (I xor A) + B + C

where A is the value from RA, I provides a logical inversion of A
(i.e. I is 0 or -1), B is either from RB or is a constant 0 or -1,
and C is 0, 1 or the carry bit from XER (CA).

To consolidate all the add/subtract instructions into a single
OP_ADD, we add a column to decode_rom_t to indicate when A should
be inverted, and change the input_carry field to a 3-state selector
to select C in the equation above.

This also adds a new "CONST_M1" value for input_reg_b_t to indicate
that B is a constant -1.  This allows us to implement addme and
subfme.

The addex instruction appears not to exist, so the comments referring
to it are removed.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 58b06eb5f3 decode: Remove const fields from decode_rom_t
The const* fields of decode_rom_t drove multiplexers in decode2 that
picked out various instruction fields and put them into the const*
fields of the Decode2ToExecute1Type record, from where they were
used in execute1.  However, the code in execute1 can just as easily
use the appropriate fields of the original instruction word, since
that is now available in execute1.  This therefore changes the
code to do that, resulting in smaller decode tables.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras bbae2d1eda decode: Index minor op table with insn bits for opcode 31
This changes decode_op_31_array from being indexed by a ppc_insn_t
(which is derived from the instruction word by a whole series of
if/elsif statements) to being indexed directly by bits 10...1 of
the instruction word.  With this we no longer need ppc_insn.

This then means that the decode1 stage doesn't distinguish between
mfcr and mfocrf, or between mtcrf and mtocrf, since those are
distinguished by the value in bit 20 of the instruction.  To
accommodate that, execute1 changes so that the one op value (OP_MFCR)
does either the mfcr or the mfocrf behaviour depending on bit 20
of the instruction word; and similarly for mtcrf/mtocrf.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 21d3f8a5ed decode: Index minor op table with insn bits for opcode 30
This comprises the 64-bit rotate and mask instructions.  In order to
reduce the table index to 3 bits, we combine rldcl and rdlcr into a
single op (OP_RLDCX), and choose the right mask at execute time based
on bit 1 of the instruction word.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras 00e9f801f6 decode: Index minor op table with insn bits for opcode 19
This changes the decoding of major opcode 19 from using the ppc_insn_t
index to using bits of the instruction word directly.  Opcode 19 has
a 10-bit minor opcode field (bits 10..1) but the space is sparsely
filled.  Therefore we index a table of single-bit entries with the
10-bit minor opcode to filter out the illegal minor opcodes, and
index a table using just 3 bits -- 5, 3 and 2 -- of the instruction
to get the decode entry.  This groups together all the instructions
in 4 columns of the opcode map as a single entry.  That means that
mcrf and all the CR logical ops get grouped together, and bcctr, bclr
and bctar get grouped together.  At present the CR logical ops are not
implemented, so their grouping has no impact.

The code for bclr and bcctr in execute1 is now common, using a single
op, and it now determines the branch address by looking at bit 10 of
the instruction word at execute time.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Paul Mackerras c9e92483b8 decode: Push mtspr/mfspr register decoding down into execute1
Instead of doing mfctr, mflr, mftb, mtctr, mtlr as separate ops,
just pass down mfspr and mtspr ops with the spr number and let
execute1 decode which SPR we're addressing.  This will help reduce
the number of instruction bits decode1 needs to look at.

In fact we now pass down the whole instruction from decode2 to
execute1.  We will need more bits of the instruction in future,
and the tools should just optimize away any that we don't end
up using.  Since the 'aa' bit was just a copy of an instruction
bit, we can now remove it from the record.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Benjamin Herrenschmidt 3e6f656a90 Add MCRF instruction
Hopefully it's not too timing catastrophic. The variable newcrf will
be handy for the other CR ops when we implement them I suspect.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt 554ae88540 Implement absolute branches
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt 80a0e7fcf3 execute1: simplify flush_out
It's always set when f_out.redirect is set, so may as well set it once
at the end. It's all combo from the register.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Anton Blanchard b57325ce29 Merge branch 'divider' of https://github.com/paulusmack/microwatt 5 years ago
Anton Blanchard 5a6f8d26d1 Rename OP_SUBFC -> OP_SUBFE, OP_ADDC -> OP_ADDE
These were somewhat badly named.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Paul Mackerras d5bc6c8824 Add a divider unit and a testbench for it
This adds a divider unit, connected to the core in much the same way
that the multiplier unit is connected.  The division algorithm is
very simple-minded, taking 64 clock cycles for any division (even
32-bit division instructions).

The decoding is simplified by making use of regularities in the
instruction encoding for div* and mod* instructions.  Instead of
having PPC_* encodings from the first-stage decoder for each of the
different div* and mod* instructions, we now just have PPC_DIV and
PPC_MOD, and the inputs to the divider that indicate what sort of
division operation to do are derived from instruction word bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard 6d85920068 execute1 no longer needs sim_console
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Michael Neuling 1e1b799382 Remove FIXME comment
This was mistakenly left behind in 4d5abfb430 ("Remove dynamic
ranges from code")

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Anton Blanchard a2df2a10a2 Remove sim console
We can force all existing code to use the UART console
by passing 0 in bit zero of the sim config register.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard a8f8c54b77 Move debug execute output into decode2
This covers all units, and we avoid double printing.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 92a7152370 Rework pipeline, add stall and flush signals
This adds stall and flush signals to the pipeline.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Michael Neuling 4d5abfb430 Remove dynamic ranges from code
Some VHDL compilers like verific [1] don't like these, so let's remove
them. Lots of random code changes, but passes make check.

Also add basic script to run verific and generate verilog.

1. https://www.verific.com/

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Anton Blanchard 0fd18c2455 Add srd and srw
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 73daacbcd4 Add sim only divw
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 5a29cb4699 Initial import of microwatt
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago