Recent litedram gets stuck at memtest unless block_until_ready=False.
(discussion in https://github.com/enjoy-digital/litedram/pull/292)
This change regenerates with latest litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Note: There are a few patches to upstream to fix an upstream breakage
of litedram standalone generator, and fix some issues with liteeth
in the way it's used on Wukong. All these have pending pull requests.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For now only the V2 of the board (slightly different pinout)
and only the A100T variant. I also haven't added GPIOs or anything
else on the PMODs really.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the 64-bit wishbone buses have the address expressed in
units of doublewords (64 bits), and similarly for the 32-bit buses the
address is in units of words (32 bits). This is to comply with the
wishbone spec. Previously the addresses on the wishbone buses were in
units of bytes regardless of the bus data width, which is not correct
and caused problems with interfacing with externally-generated logic.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This is a NiteFury based PCIe M2 form-factor board originally
used for mining. It contains a speed grade 2 Artix 7 200T,
1GB of DDR3 and 32MB of flash.
The serial port is routed to pin 2 (RX) and 3 (TX) of the P2
connector (pin 1 is GND).
Note: Only 16MB of flash is currently usable until code is added
to configure the flash controller to use 4-bytes address commands
on that part.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This regenerate litedram for all targets (genesys2 is new in this
build) using the latest LiteX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some changes in LiteX broke us. Adapt the build system and
increase the init RAM size to 24KB.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
litedram ignores a couple of signals of his "pseudo-axi" port,
this adds a bit of documentation around it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make the DRAM data lines and user port width configurable, also
don't hard wire dependency on the wishbone data width.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This implements in the L2 cache the feature already in the L1s
allowing a request to be completed before the end of a refill
using partial line valid bits, and starting a refill from the
row of the first miss on that line instead of the beginning of
the line.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The sdram_init ELF fails to link:
powerpc64le-linux-gnu-ld -static -nostdlib -T sdram_init.lds \
--gc-sections -o sdram_init.elf head.o main.o sdram.o console.o \
libc.o sdram_init.lds
powerpc64le-linux-gnu-ld: error: linker script file 'sdram_init.lds'
appears multiple times
make: *** [Makefile:70: sdram_init.elf] Error 1
This is because sdram_init.lds is one of the prerequisites, and thus is
contained in $^. However, it is also explicitly specified as part of
LDFLAGS, as the argument to -T.
Signed-off-by: Boris Shingarov <shingarov@labware.com>
Use a more generic console_init() instead of potato_uart_init(),
and do the same for interrupt control. There should be no
change in behaviour.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the control bus currently going out of "soc" towards
litedram more generic for external IO devices added by the
top-level rather than inside the SoC proper.
This is mostly renaming of signals and a small change on how the
address decoder operates, using a separate "cascaded" decode for
the external IOs.
We make the region 0xc8nn_nnnn be the "external IO" region for
now.
This will make it easier / cleaner to add more external devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It will look for an ELF binary at the flash offset specified
for the board (currently 0x300000 on Arty but that could be
changed).
Note: litedram is regenerated in order to rebuild the init code,
which was done using a newer version of litedram from LiteX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is a long timing path to generate the ack signal from
the L2 cache as it's fully combinational for stores, including
signals coming from litedram.
Instead, pipeline the store acks. This will introduce a cycle
latency but should improve timing. Also the core will eventually
be smart enough not to wait for store acks to complete them anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This breaks the long stall signal coming back to the processor
and helps improve overall timing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, there's a huge mux gathering the output of all the PLRUs
to select the victim way on cache miss. This is fed combinationally
into the clearing of the valid and tags.
In order to help timing, let's store it instead and perform the
clearing on the next cycle. The L2 doesn't respond to requests
when not in IDLE state so this should have no negative effects.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We still had some wires bringing an extra serial port out of
litedram for the built-in riscv processor. This is all gone now
so take them out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some fields might get extended with extra bits, use the appropriate
masks when reading the values.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This increases the number of L2 lines from 32 to 64. The BRAM usage is the
same as they were only half used. There's an increase in LUTs and registers
due to the extra tags and valid bits, but none of it should be in a
space constrained or critical timing path.
We could make it wider instead (256 bytes lines) which would reduce usage
instead, but this increases the latency by 8 cycles. Something to consider
once the L2 is capable of early response on miss and starting reloads
from any point in a line.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Support for this has bitrotted and would require refactoring of L2 to
be brought back. It's also not really needed anymore now that we ship
pre-generated litedram and that LiteX supports what we do.
So take it out, which simplifies some of the scripts as well. This also
fixes up CSR alignment the sim model.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The test bench test simple access forms for now, it's a starting point
but it already helped find/fix a bug.
Includes a litedram update to be able to operate the sim model without
inits.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a cache between the wishbone and litedram with the following
features (at this point, it's still evolving)
- 128 bytes line width in order to have a reasonable amount of
litedram pipelining on the 128-bit wide data port.
- Configurable geometry otherwise
- Stores are acked immediately on wishbone whether hit or miss
(minus a 2 cycles delay if there's a previous load response in the
way) and sent to LiteDRAM via 8 entries (configurable) store queue
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds an option to disable the main BRAM and instead copy a
payload stashed along with the init code in the secondary BRAM
into DRAM and boot from there
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a simulated litedram model along with the necessary
Makefile gunk to verilate it and wrap it for use by ghdl.
The core_dram_tb test bench is a variant of core_tb with
LiteDRAM simulated. It's not built by default, an explicit
make core_dram_tb
is necessary as to not require verilator to be installed for
the normal build process (also it's slow'ish).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This changes the SoC interconnect such that the main 64-bit wishbone out
of the processor is first split between only 3 slaves (BRAM, DRAM and a
general "IO" bus) instead of all the slaves in the SoC.
The IO bus leg is then latched and down-converted to 32 bits data width,
before going through a second address decoder for the various IO devices.
This significantly reduces routing and timing pressure on the main bus,
allowing to get rid of frequent timing violations when synthetizing on
small'ish FPGAs such as the Artix-7 35T found on the original Arty board.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>