We've been investigating why the barrel rotator uses an enormous
number of cells on the yosys ECP5 target. Eventually it was narrowed
down to the -abc9 -nowidelut options, which see the cell count go from
4985 cells to 841 cells.
Using the same options on an Orange Crab build reduces the cell count
from 50864 to 36085. The main differences:
LUT4 31040 -> 25270
PFUMX 6956 -> 0
L6MUX21 1746 -> 0
CCU2C 2066 -> 1759
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
The icache RAM is currently LUT ram not block ram. This massively
bloats the icache size. We think this is due to yosys not inferencing
the RAM correctly but that's yet to be confirmed.
Work around this for now by reducing the default size of the icache
RAM for the ECP5 builds.
On the ECP5 85K builts, this gets us from 95% down to 76% and helps
our CI to pass.
Signed-off-by: Michael Neuling <mikey@neuling.org>
This commit also removes the dependencies these testbenches have on VHPIDIRECT.
The use of VHPIDIRECT limits the number of available simulators for the project. Rather than using
foreign functions the testbenches can be implemented entirely in VHDL where equivalent functionality exists.
For these testbenches the VHPIDIRECT-based randomization functions were replaced with VHDL-based functions.
The testbenches recognized by VUnit can be executed in parallel threads for better simulation performance using
the -p option to the run.py script
Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
This adds a GPIO controller which provides 32 bits of I/O. The
registers are modelled on the set used by the gpio-ftgpio010.c driver
in the Linux kernel. Currently there is no interrupt capability
implemented, though an interrupt line from the GPIO subsystem to the
XICS has been connected.
For the Arty A7 board, GPIO lines 0 to 13 are connected to the pins
labelled IO0 to IO13 on the "shield" connector, GPIO lines 14 to 29
connect to IO26 to IO41, GPIO line 30 connects to the pin labelled A
(aka IO42), and GPIO line 31 is connected to LED 7.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This changes the way GPR hazards are detected and tracked. Instead of
having a model of the pipeline in gpr_hazard.vhdl, which has to mirror
the behaviour of the real pipeline exactly, we now assign a 2-bit tag
to each instruction and record which GSPR the instruction writes.
Subsequent instructions that need to use the GSPR get the tag number
and stall until the value with that tag is being written back to the
register file.
For now, the forwarding paths are disabled. That gives about a 8%
reduction in coremark performance.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Our Makefiles need some work, but for now create an FPGA target:
make FPGA_TARGET=verilator microwatt-verilator
ghdl and yosys can use containers using PODMAN=1 or DOCKER=1
options.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
yosys and verilator did not like us passing in the verilog and
exporting it again. Pass the source directly to verilator instead.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
This adds the skeleton of a floating-point unit and implements the
mffs and mtfsf instructions.
Execute1 sends FP instructions to the FPU and receives busy,
exception, FP interrupt and illegal interrupt signals from it.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a true random number generator for the Xilinx FPGAs which
uses a set of chaotic ring oscillators to generate random bits and
then passes them through a Linear Hybrid Cellular Automaton (LHCA) to
remove bias, as described in "High Speed True Random Number Generators
in Xilinx FPGAs" by Catalin Baetoniu of Xilinx Inc., in:
https://pdfs.semanticscholar.org/83ac/9e9c1bb3dad5180654984604c8d5d8137412.pdf
This requires adding a .xdc file to tell vivado that the combinatorial
loops that form the ring oscillators are intentional. The same
code should work on other FPGAs as well if their tools can be told to
accept the combinatorial loops.
For simulation, the random.vhdl module gets compiled in, which uses
the pseudorand() function to generate random numbers.
Synthesis using yosys uses nonrandom.vhdl, which always signals an
error, causing darn to return 0xffff_ffff_ffff_ffff.
This adds an implementation of the darn instruction. Darn can return
either raw or conditioned random numbers. On Xilinx FPGAs, reading a
raw random number gives the output of the ring oscillators, and
reading a conditioned random number gives the output of the LHCA.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Means we can synthesize at 40Mhz (where we currently make timing) and
our UART still works at 115200 baud.
Tested working hello world unmodified with ECP5 eval board. Orange
Crab is updated but is untested.
Signed-off-by: Michael Neuling <mikey@neuling.org>
This allows these targets
FPGA_TARGET=ORANGE-CRAB make microwatt.bit
FPGA_TARGET=ECP5-EVN make microwatt.bit
Default is ORANGE-CRAB as before
ECP5-EVN is tested on real hardware. The console only works at 38400 so
needs this in console.c and a recompile of hello_world to work:
-#define UART_FREQ 115200
+#define UART_FREQ 38400
With this 'FPGA_TARGET=ECP5-EVN make prog' works on the ECP5 dev board.
Signed-off-by: Michael Neuling <mikey@neuling.org>
This is useful to specify "-noflatten" which helps CI stay under 8GB
limit.
Normally the AUTONAME stage of yosys will take around 10GB if
operating on the whole design. With -noflatten, AUTONAME occurs only
per VHDL entity, so only consumes around 3GB of memory. This gets us
under the limitations on github actions.
More discussion here:
https://github.com/antonblanchard/microwatt/pull/209#issuecomment-652186078
Signed-off-by: Michael Neuling <mikey@neuling.org>
nextpnr will leave an output file around even when it errors out, so
build to a tmp file and move it when we succeed so we don't confuse
make.
Signed-off-by: Michael Neuling <mikey@neuling.org>
The fetch2 stage existed primarily to provide a stash buffer for the
output of icache when a stall occurred. However, we can get the same
effect -- of having the input to decode1 stay unchanged on a stall
cycle -- by using the read enable of the BRAMs in icache, and by
adding logic to keep the outputs unchanged on a clock cycle when
stall_in = 1. This reduces branch and interrupt latency by one
cycle.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This require the s25fl128s.vhd flash model and FMF libraries,
which will be built when passed to the Makefile via the
FLASH_MODEL_PATH argument. Otherwise a dummy module is used
which ties MISO to '1'.
The model isn't included as I'm not sure its licence (GPL) is
at this point, but it can be obtained from
https://github.com/ozbenh/microspi
FLASH_MODEL_PATH=<path to microspi>/model
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The rewrite of the Makefile to use "ghdl -c" somewhat broke building
the unisim library as ghdl doesn't yet support putting files in
separate libraries from a single command line invocation.
The workaround at the time was to put the entire project in "unisim"
which is ... weird and will break if we try to add another library
such as fmf.
This fixes it by generating the library separately using "ghdl -i"
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We don't run these but we should.
The SOC tests have bit rotted. We need to fix them but leave them out
for now.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Support for this has bitrotted and would require refactoring of L2 to
be brought back. It's also not really needed anymore now that we ship
pre-generated litedram and that LiteX supports what we do.
So take it out, which simplifies some of the scripts as well. This also
fixes up CSR alignment the sim model.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The test bench test simple access forms for now, it's a starting point
but it already helped find/fix a bug.
Includes a litedram update to be able to operate the sim model without
inits.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a cache between the wishbone and litedram with the following
features (at this point, it's still evolving)
- 128 bytes line width in order to have a reasonable amount of
litedram pipelining on the 128-bit wide data port.
- Configurable geometry otherwise
- Stores are acked immediately on wishbone whether hit or miss
(minus a 2 cycles delay if there's a previous load response in the
way) and sent to LiteDRAM via 8 entries (configurable) store queue
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a simulated litedram model along with the necessary
Makefile gunk to verilate it and wrap it for use by ghdl.
The core_dram_tb test bench is a variant of core_tb with
LiteDRAM simulated. It's not built by default, an explicit
make core_dram_tb
is necessary as to not require verilator to be installed for
the normal build process (also it's slow'ish).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We still need to a way to our FPGA target on the command line, but this
at least gets us down to a common Makefile.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Instead of building each file one by one (and having to track all
the dependencies manually), use the ghdl -c command that does
analysis and elaboration in one go.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
These provides some info about the SoC (though it's still somewhat
incomplete and needs more work, see comments).
There's also a control register for selecting DRAM vs. BRAM at 0
(and for soft-resetting the SoC but that isn't wired up yet).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>