Commit Graph

131 Commits (dfecda3a5fcb42fe6bc8e83621bf976032ff50c9)

Author SHA1 Message Date
Paul Mackerras 86212dc879 icache: Split PLRU into storage and logic
Rather than having update and decode logic for each individual PLRU
as well as a register to store the current PLRU state, we now put the
PLRU state in a little RAM, which will typically use LUT RAM on FPGAs,
and have just a single copy of the logic to calculate the pseudo-LRU
way and to update the PLRU state.  This logic is in the plrufn module
and is just combinatorial logic.  A new module was created for this as
other parts of the system are still using plru.vhdl.

The PLRU RAM in the icache is read asynchronously in the cycle
after the cache tag matching is done.  At the end of that cycle the
PLRU RAM entry is updated if the access was a cache hit, or a victim
way is calculated and stored if the access was a cache miss and
miss handling is starting in this cycle.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Benjamin Herrenschmidt 39c2abae51 Fix build of core_dram_tb and dram_tb and fix tracing
We disabled --trace by default, so we need to stop linking verilated_vcd_c.o
as it doesn't exist in that case.

While at it, make a Makefile variable to enable/disable verilator tracing
and add a couple of generics to those test benches to control tracing
in the L2 and in litedram.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Dan Horák 1ddbacb67f syscon: Implement a register for storing git hash info
It also stores the dirty status so that's known.

This does some Makefile tricks so that we only rebuild when the git
hash changes. This avoids rebuilding the world every time we run
make.

Also adds fusesoc generator, so that should continue to work as
before.

Signed-off-by: Dan Horák <dan@danny.cz>
Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago
Paul Mackerras 595a758400 execute1: Add a pipelined 33-bit signed multiplier
This adds a pipelined 33-bit by 33-bit signed multiplier with one
cycle latency to the execute pipeline, and uses it for the mullw,
mulhw and mulhwu instructions.  Because it has one cycle of latency we
can assume that its result is available in the second execute stage
without needing to add busy logic to the second stage.

This adds both a generic version of the multiplier and a
Xilinx-specific version using four DSP slices of the Artix-7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 21ab36a0c0 Pre-decode instructions when writing them to icache
This splits out the decoding done in the decode0 step into a separate
predecoder, used when writing instructions into the icache.  The
icache now holds 36 bits per instruction rather than 32.  For valid
instructions, those 36 bits comprise the bottom 26 bits of the
instruction word, a 9-bit insn_code value (which uniquely identifies
the instruction), and a zero in the MSB.  For illegal instructions,
the MSB is one and the full instruction word is in the bottom 32 bits.
Having the full instruction word available for illegal instructions
means that it can be printed in the log when simulating, or in future
could be placed in the HEIR register.

If we don't have an FPU, then the floating-point instructions are
regarded as illegal.  In that case, the insn_code values would fit
into 8 bits, which could be used in future to reduce the size of
decode_rom from 512 to 256 entries.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Matt Johnston 3775650df3 dmi_dtm_ecp5: Use ECP5 JTAGG for DMI
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087

Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Paul Mackerras 2491aa7fc5 core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result.  The motivation for this is that the popcnt*
calculation was showing up as a critical path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling cdd661d844
Merge branch 'master' into orangecrab-merge 3 years ago
Michael Neuling fda8879e2f
Merge pull request #341 from mkj/progtools
orangecrab programming targets
3 years ago
Matt Johnston abc6a4f372 orangecrab: use litesdcard
Currently not working (tested in Linux)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston a9b467f43b orangecrab: add Orange Crab r0.2 target
top-orangecrab0.2 is a copy of top-arty with various changes.
USRMCLK is added for the SPI clock
ethernet is removed

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 5e90133b61 Makefile: add ecpprog targets
The 0x80000 offset is specific to the OrangeCrab bootloader.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 7761bf8b71 Makefile: Add DFU programming
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 2ec0d5fccd Makefile: detect when ghdl is a yosys plugin
oss-cad-suite builds it as a plugin, some other toolchains
have it built in.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Joel Stanley 9ceb463957 Makefile: Use read_verilog with yosys
Yosys changed command line behaviour following the v0.12 release.  Work
around this by using read_verilog, which maintains the old behaviour.

This should work fine for current yosys and be compatible with
future releases.

See https://github.com/YosysHQ/yosys/issues/3109

Signed-off-by: Joel Stanley <joel@jms.id.au>
3 years ago
Paul Mackerras a5c9b3c412 Makefile: Add a target for the Orange Crab v0.21 with LFE5U-85F
The existing orange crab target is for an older board with a
LFE5UM5G-85F device.  Newer orange crab boards (v0.21) have a
LFE5U-85F device in the -8 speed grade, so make a new target for them
called ORANGE-CRAB-0.21.

Also add flags to ecppack to indicate that the bitstream should be
compressed and can be loaded at 38.8MHz.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Anton Blanchard 06266fe84a Orange Crab is 48MHz not 50MHz, bump PLL frequency
I'm not sure why I set the input frequency for the Orange Crab to 50MHz.
Since we easily make timing now, bump our output frequency to 48MHz as
well.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard c81583c128 makefile: Check environment for MEMORY_SIZE/RAM_INIT_FILE
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard efb387b0d2 makefile: Add some verilator micropython tests
These are the same micropython tests we use against the ghdl
simulation.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 8acd5a5607 verilator: Specify top level module
While verilator finds the correct top level module with the current
setup, if we start adding simulation models it can get confused.

Explicitly specify the top level module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 7e2de602ee makefile: Simplify microwatt-verilator target, add Docker image
Recent versions of verilator support the --build option, allowing
us to remove a step.

Also add a Docker image for verilator.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 2bd00f5119
Merge pull request #315 from paulusmack/pmu
Add basic PMU implementation
3 years ago
Paul Mackerras a7873b45f7 core: Add a basic performance monitor unit (PMU) implementation
This is the start of an implementation of a PMU according to PowerISA
v3.0B.  Things not implemented yet include most architected events,
the BHRB, event-based branches, thresholding, MMCR0[TBCC] field, etc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Anton Blanchard 6254bb5ee9 Reduce Yosys ECP5 cell usage by 30% with -abc9 -nowidelut
We've been investigating why the barrel rotator uses an enormous
number of cells on the yosys ECP5 target. Eventually it was narrowed
down to the -abc9 -nowidelut options, which see the cell count go from
4985 cells to 841 cells.

Using the same options on an Orange Crab build reduces the cell count
from 50864 to 36085. The main differences:

     LUT4                        31040 -> 25270
     PFUMX                        6956 ->     0
     L6MUX21                      1746 ->     0
     CCU2C                        2066 ->  1759

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Michael Neuling bf76261979 makefile: Add check_vunit
Allow newly added vuint run script to be run via make. Also integrate
with DOCKER/PODMAN=1.

Signed-off-by: Michael Neuling <mikey@neuling.org>
4 years ago
Michael Neuling d7458d5beb
Reduce the size of icache to help yosys ECP5 builds (#303)
The icache RAM is currently LUT ram not block ram. This massively
bloats the icache size. We think this is due to yosys not inferencing
the RAM correctly but that's yet to be confirmed.

Work around this for now by reducing the default size of the icache
RAM for the ECP5 builds.

On the ECP5 85K builts, this gets us from 95% down to 76% and helps
our CI to pass.

Signed-off-by: Michael Neuling <mikey@neuling.org>
4 years ago
Lars Asplund 08c0c4c1b4 Make core testbenches recognized by VUnit
This commit also removes the dependencies these testbenches have on VHPIDIRECT.
The use of VHPIDIRECT limits the number of available simulators for the project. Rather than using
foreign functions the testbenches can be implemented entirely in VHDL where equivalent functionality exists.
For these testbenches the VHPIDIRECT-based randomization functions were replaced with VHDL-based functions.

The testbenches recognized by VUnit can be executed in parallel threads for better simulation performance using
the -p option to the run.py script

Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
4 years ago
Michael Neuling 84473eda1b Merge pull request #277 from paulus/gpio
A few cleanups. GPIO IRQ number is now 4 as 3 is now taken by the SD card.
4 years ago
Anton Blanchard be11ebbf6d Remove unused GHDL_TARGET_GENERICS
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Anton Blanchard 33c78f9282 Move verilator --trace flag into VERILATOR_FLAGS
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Anton Blanchard 4ab36517ec Remove -frelaxed
We don't appear to need this any more, so remove it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Anton Blanchard 561d6af6f0 Use VERILATOR_FLAGS/VERILATOR_CFLAGS on all verilator targets
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Anton Blanchard 75da4156fe Remove core_files from soc_files and fpga_files
We were already including the core_files at the same time as the
soc_files in many targets.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Paul Mackerras f06ffcf9b7 Add a GPIO controller and use it to drive the shield I/O pins on the Arty
This adds a GPIO controller which provides 32 bits of I/O.  The
registers are modelled on the set used by the gpio-ftgpio010.c driver
in the Linux kernel.  Currently there is no interrupt capability
implemented, though an interrupt line from the GPIO subsystem to the
XICS has been connected.

For the Arty A7 board, GPIO lines 0 to 13 are connected to the pins
labelled IO0 to IO13 on the "shield" connector, GPIO lines 14 to 29
connect to IO26 to IO41, GPIO line 30 connects to the pin labelled A
(aka IO42), and GPIO line 31 is connected to LED 7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras ae2afeca5c core: Track CR hazards and bypasses using tags
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras c0b45e153b core: Track GPR hazards using tags that propagate through the pipelines
This changes the way GPR hazards are detected and tracked.  Instead of
having a model of the pipeline in gpr_hazard.vhdl, which has to mirror
the behaviour of the real pipeline exactly, we now assign a 2-bit tag
to each instruction and record which GSPR the instruction writes.
Subsequent instructions that need to use the GSPR get the tag number
and stall until the value with that tag is being written back to the
register file.

For now, the forwarding paths are disabled.  That gives about a 8%
reduction in coremark performance.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Anton Blanchard df8e1ca2a7 Add verilator FPGA target
Our Makefiles need some work, but for now create an FPGA target:

make FPGA_TARGET=verilator microwatt-verilator

ghdl and yosys can use containers using PODMAN=1 or DOCKER=1
options.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
Anton Blanchard 4bc5169f78 Fix verilator build
yosys and verilator did not like us passing in the verilog and
exporting it again. Pass the source directly to verilator instead.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
4 years ago
umarcor de808d7a6a makefile: update synthesis containers
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
4 years ago
umarcor d05737833e makefile: whitespace cleanup
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
4 years ago
Paul Mackerras 856e9e955f core: Add framework for an FPU
This adds the skeleton of a floating-point unit and implements the
mffs and mtfsf instructions.

Execute1 sends FP instructions to the FPU and receives busy,
exception, FP interrupt and illegal interrupt signals from it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras 1a7aebeef8 Add random number generator and implement the darn instruction
This adds a true random number generator for the Xilinx FPGAs which
uses a set of chaotic ring oscillators to generate random bits and
then passes them through a Linear Hybrid Cellular Automaton (LHCA) to
remove bias, as described in "High Speed True Random Number Generators
in Xilinx FPGAs" by Catalin Baetoniu of Xilinx Inc., in:

https://pdfs.semanticscholar.org/83ac/9e9c1bb3dad5180654984604c8d5d8137412.pdf

This requires adding a .xdc file to tell vivado that the combinatorial
loops that form the ring oscillators are intentional.  The same
code should work on other FPGAs as well if their tools can be told to
accept the combinatorial loops.

For simulation, the random.vhdl module gets compiled in, which uses
the pseudorand() function to generate random numbers.

Synthesis using yosys uses nonrandom.vhdl, which always signals an
error, causing darn to return 0xffff_ffff_ffff_ffff.

This adds an implementation of the darn instruction.  Darn can return
either raw or conditioned random numbers.  On Xilinx FPGAs, reading a
raw random number gives the output of the ring oscillators, and
reading a conditioned random number gives the output of the LHCA.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Michael Neuling 5aaa63ee3b Add PLL for ECP5 device
Means we can synthesize at 40Mhz (where we currently make timing) and
our UART still works at 115200 baud.

Tested working hello world unmodified with ECP5 eval board. Orange
Crab is updated but is untested.

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Anton Blanchard 4e977bf8a9
Merge pull request #220 from mikey/ghdl-makefile
Use $(GHDL) rather than ghdl in Makefile
5 years ago
Michael Neuling 1697f8a08f Use $(GHDL) rather than ghdl in Makefile
Suggestion from @eine in PR #219.

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Michael Neuling 10a1a86ba0 Add FPGA_TARGET=ECP5-EVN make option for synthesis build
This allows these targets
  FPGA_TARGET=ORANGE-CRAB make microwatt.bit
  FPGA_TARGET=ECP5-EVN make microwatt.bit
Default is ORANGE-CRAB as before

ECP5-EVN is tested on real hardware. The console only works at 38400 so
needs this in console.c and a recompile of hello_world to work:

  -#define UART_FREQ 115200
  +#define UART_FREQ 38400

With this 'FPGA_TARGET=ECP5-EVN make prog' works on the ECP5 dev board.

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Michael Neuling ef0dcf3bc6 Add SYNTH_ECP5_FLAGS option for building
This is useful to specify "-noflatten" which helps CI stay under 8GB
limit.

Normally the AUTONAME stage of yosys will take around 10GB if
operating on the whole design. With -noflatten, AUTONAME occurs only
per VHDL entity, so only consumes around 3GB of memory. This gets us
under the limitations on github actions.

More discussion here:
  https://github.com/antonblanchard/microwatt/pull/209#issuecomment-652186078

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Michael Neuling 45fd2354f2 Add ram file to synthesis build dependencies
Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Michael Neuling 7347786b08 Add uart16550 files to yosys/nextpnr build
These are verilog so need passed to yosys differently than the VHDL
files.

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago
Michael Neuling 3f6d48f2fc Build to tmp file so nextpnr errors don't confuse make
nextpnr will leave an output file around even when it errors out, so
build to a tmp file and move it when we succeed so we don't confuse
make.

Signed-off-by: Michael Neuling <mikey@neuling.org>
5 years ago