This splits out the decoding done in the decode0 step into a separate
predecoder, used when writing instructions into the icache. The
icache now holds 36 bits per instruction rather than 32. For valid
instructions, those 36 bits comprise the bottom 26 bits of the
instruction word, a 9-bit insn_code value (which uniquely identifies
the instruction), and a zero in the MSB. For illegal instructions,
the MSB is one and the full instruction word is in the bottom 32 bits.
Having the full instruction word available for illegal instructions
means that it can be printed in the log when simulating, or in future
could be placed in the HEIR register.
If we don't have an FPU, then the floating-point instructions are
regarded as illegal. In that case, the insn_code values would fit
into 8 bits, which could be used in future to reduce the size of
decode_rom from 512 to 256 entries.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The global wr_en signal is causing Vivado to generate two TDP (True Dual Port)
block RAMs instead of one SDP (Simple Dual Port) for each cache way. Remove
it and instead apply a AND to the individual byte write enables.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the BRAMs use an output buffer, introducing an extra
cycle latency. Without this, Vivado won't make timing at 100Mhz.
We stash all the necessary response data in delayed latches, the
extra cycle is NOT a state in the state machine, thus it's fully
pipelined and doesn't involve stalling.
This introduces an extra non-pipelined cycle for loads with update
to avoid collision on the writeback output between the now delayed
load data and the register update. We could avoid it by moving
the register update in the pipeline bubble created by the extra
update state, but it's a bit trickier, so I leave that for a latter
optimization.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds support for set associativity to the icache. It can still
be direct mapped by setting NUM_WAYS to 1.
The replacement policy uses a simple tree-PLRU for each set.
This is only lightly tested, tests pass but I have to double check
that we are using the ways effectively and not creating duplicates.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>