They hurt timing forcing signals to come from the master and back
again in one cycle. Stall isn't sampled by the master unless there
is an active cycle so masking it with cyc is pointless. Masking acks
is somewhat pointless too as we don't handle early dropping of cyc
in any of our slaves properly anyways.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It will look for an ELF binary at the flash offset specified
for the board (currently 0x300000 on Arty but that could be
changed).
Note: litedram is regenerated in order to rebuild the init code,
which was done using a newer version of litedram from LiteX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This require the s25fl128s.vhd flash model and FMF libraries,
which will be built when passed to the Makefile via the
FLASH_MODEL_PATH argument. Otherwise a dummy module is used
which ties MISO to '1'.
The model isn't included as I'm not sure its licence (GPL) is
at this point, but it can be obtained from
https://github.com/ozbenh/microspi
FLASH_MODEL_PATH=<path to microspi>/model
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This stores the output of the PLRU big mux and clears the
tags and valid bits on the next cycle.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds an SPI flash controller which supports direct
memory-mapped access to the flash along with a manual
mode to send commands.
The direct mode can be set via generic to default to single
wire or quad mode. The controller supports normal, dual and quad
accesses with configurable commands, clock divider, dummy clocks
etc...
The SPI clock can be an even divider of sys_clk starting at 2
(so max 50Mhz with our typical Arty designs).
A flash offset is carried via generics to syscon to tell SW about
which portion of the flash is reserved for the FPGA bitfile. There
is currently no plumbing to make the CPU reset past that address (TBD).
Note: Operating at 50Mhz has proven unreliable without adding some
delay to the sampling of the input data. I'm working in improving
this, in the meantime, I'm leaving the default set at 25 Mhz.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is a long timing path to generate the ack signal from
the L2 cache as it's fully combinational for stores, including
signals coming from litedram.
Instead, pipeline the store acks. This will introduce a cycle
latency but should improve timing. Also the core will eventually
be smart enough not to wait for store acks to complete them anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The DRAM related pins have some small changes in LiteX, so resync
and add the false path information as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This breaks the long stall signal coming back to the processor
and helps improve overall timing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, there's a huge mux gathering the output of all the PLRUs
to select the victim way on cache miss. This is fed combinationally
into the clearing of the valid and tags.
In order to help timing, let's store it instead and perform the
clearing on the next cycle. The L2 doesn't respond to requests
when not in IDLE state so this should have no negative effects.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Github workflow gives us longer run times and faster startup.
Major kudos for this goes to @eine for the initial version and for
pushing us in this direction.
Signed-off-by: Michael Neuling <mikey@neuling.org>
litedram build directory used by the generator and the
verilator obj_dir can be taken out
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We still had some wires bringing an extra serial port out of
litedram for the built-in riscv processor. This is all gone now
so take them out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some fields might get extended with extra bits, use the appropriate
masks when reading the values.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The rewrite of the Makefile to use "ghdl -c" somewhat broke building
the unisim library as ghdl doesn't yet support putting files in
separate libraries from a single command line invocation.
The workaround at the time was to put the entire project in "unisim"
which is ... weird and will break if we try to add another library
such as fmf.
This fixes it by generating the library separately using "ghdl -i"
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The changes in d3c274d01e ("flash-arty: Add support for specifying the file type")
added a local jtagspi.cfg, which meant openocd must be run from the root
of the microwatt directory.
This puts the content into the xilinx-xc7.cfg so the script can be used
from any path again.
Signed-off-by: Joel Stanley <joel@jms.id.au>
icbi currently just resets the icache. This has some nasty side
effects such as also clearing the TLB, but also the wishbone interface.
That means that any ongoing cycle will be dropped.
However, most of our slaves don't handle that well and will continue
sending acks for already issued requests.
Under some circumstances we can thus restart an icache load and get
spurious ack/data from the wishbone left over from the "cancelled"
sequence.
This has broken booting Linux for me.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These were missed earlier when the single-issue flag was turned off on
the other loads and stores by commit 1a244d3470 ("Remove single-issue
constraint for most loads and stores").
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We don't run these but we should.
The SOC tests have bit rotted. We need to fix them but leave them out
for now.
Signed-off-by: Michael Neuling <mikey@neuling.org>
This increases the number of L2 lines from 32 to 64. The BRAM usage is the
same as they were only half used. There's an increase in LUTs and registers
due to the extra tags and valid bits, but none of it should be in a
space constrained or critical timing path.
We could make it wider instead (256 bytes lines) which would reduce usage
instead, but this increases the latency by 8 cycles. Something to consider
once the L2 is capable of early response on miss and starting reloads
from any point in a line.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
By adding logic to decode2 to be able to send the instruction address
down the A input, and making CONST_DX_HI (renamed to CONST_DXHI4) add
4 to the immediate value (easy since the bottom 16 bits were zero),
we can do addpcis using the main adder. This reduces the width of the
result mux and frees up one value in insn_type_t, since we can now use
OP_ADD for addpcis.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Support for this has bitrotted and would require refactoring of L2 to
be brought back. It's also not really needed anymore now that we ship
pre-generated litedram and that LiteX supports what we do.
So take it out, which simplifies some of the scripts as well. This also
fixes up CSR alignment the sim model.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The test bench test simple access forms for now, it's a starting point
but it already helped find/fix a bug.
Includes a litedram update to be able to operate the sim model without
inits.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a cache between the wishbone and litedram with the following
features (at this point, it's still evolving)
- 128 bytes line width in order to have a reasonable amount of
litedram pipelining on the 128-bit wide data port.
- Configurable geometry otherwise
- Stores are acked immediately on wishbone whether hit or miss
(minus a 2 cycles delay if there's a previous load response in the
way) and sent to LiteDRAM via 8 entries (configurable) store queue
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>