Compare commits

...

229 Commits

Author SHA1 Message Date
Paul Mackerras 35e0dbed34
Merge pull request #353 from tianrui-wei/master
fix: fix icache_tb not finishing correctly
12 months ago
Michael Neuling cd52390bf1
Merge pull request #373 from antonblanchard/icache-insn-u-state
icache: Don't output X on i_out.insn
12 months ago
Michael Neuling b983d5080e
Merge pull request #376 from antonblanchard/loadstore-init
loadstore1: reduce U state being output
12 months ago
Michael Neuling d4db331467
Merge pull request #374 from antonblanchard/icache-unused-sig
core: Remove unused icache_inv signal
12 months ago
Michael Neuling ee5e3778ed
Merge pull request #364 from shenki/readme-updates
Readme updates
12 months ago
Michael Neuling c43692f4c7
Merge pull request #372 from antonblanchard/dcache-unused-sig
dcache: remove unused do_write signal
12 months ago
Michael Neuling 956df2c863
Merge pull request #371 from antonblanchard/unused-sig
execute1: sub_mux_sel and result_mux_sel are unused
12 months ago
Michael Neuling 3627f102db
Merge pull request #370 from antonblanchard/divider-init
divider: Fix d_out.overflow U state issue
12 months ago
Paul Mackerras 6e1e763c02
Merge pull request #368 from antonblanchard/icache-pmu-events
icache: Hook up PMU events
12 months ago
Anton Blanchard 1047239a37
Merge pull request #377 from antonblanchard/fpu-init
fpu: Reduce uninitialised signals
12 months ago
Anton Blanchard 9d35340bb1 fpu: Reduce uninitialised signals
Reduce uninitialised signals coming out of the FPU.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Michael Neuling b82eea5933
Merge pull request #366 from antonblanchard/hello-world-bss
Zero BSS in hello world test
12 months ago
Anton Blanchard d3aff67fa7
Merge pull request #375 from antonblanchard/core_debug-init
core_debug: Initialise gspr_index
12 months ago
Anton Blanchard b47b71821e loadstore1: reduce U state being output
While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard 71d4b5ed20 core_debug: Initialise gspr_index
Another case of U state being driven out of a module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard a527d9b959 core: Remove unused icache_inv signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard e7f0a7c7ac icache: Don't output X on i_out.insn
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard 39220be311 dcache: remove unused do_write signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard 843361f2be execute1: sub_mux_sel and result_mux_sel are unused
Remove them.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard d3a7517318 divider: Fix d_out.overflow U state issue
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard 1ff852b012
Merge pull request #369 from antonblanchard/loadstore-pmu-init
loadstore1: Initialise PMU events
12 months ago
Anton Blanchard e2438071a1 loadstore1: Initialise PMU events
The loadstore1 PMU events are U state until a load and a store completes.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard b7c4d3c5c3
Merge pull request #367 from antonblanchard/fpu-typo
fpu: Fix capitalisation of Execute1ToFPUType
12 months ago
Anton Blanchard f06abb67ad icache: Hook up PMU events
We weren't connecting the icache PMU events up.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard 64d2def0c6 fpu: Fix capitalisation of Execute1ToFPUType
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
12 months ago
Anton Blanchard ff442d1bdb Zero BSS in hello world test
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Anton Blanchard b8fc5636a4
Merge pull request #365 from antonblanchard/less-fpga-init
Remove some FPGA style signal inits
1 year ago
Anton Blanchard ebdddcc402 Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Anton Blanchard a750365ffa Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Joel Stanley 9ec22af256 README: Add Linux on Microwatt instructions
These instructions are similar to those at

 https://ozlabs.org/~joel/microwatt/README

except they describe how to build the artifacts from scratch instead of
downloading them.

Signed-off-by: Joel Stanley <joel@jms.id.au>
1 year ago
Joel Stanley a31725d989 README: Add uart to fusesoc instructions
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.

Also remove the text about a working directory; fusesoc doesn't need
one.

Signed-off-by: Joel Stanley <joel@jms.id.au>
1 year ago
Michael Neuling f5e06c2d4b
Merge pull request #361 from antonblanchard/alt-reset-address
Allow ALT_RESET_ADDRESS to be overridden
1 year ago
Anton Blanchard 948f6f43a7 Allow ALT_RESET_ADDRESS to be overridden
This allows us to boot from flash for example.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Michael Neuling 8bf48ac094
Merge pull request #360 from antonblanchard/log2ceil-issue
wishbone_bram_wrapper ram_addr_bits is 1 bit off
1 year ago
Anton Blanchard b5accb78b2 wishbone_bram_wrapper ram_addr_bits is 1 bit off
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.

Every other user of log2ceil() gets this right.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Michael Neuling 30fd936c12
Merge pull request #358 from antonblanchard/unused-sig
Remove unused sequential signal from Fetch1ToIcacheType
1 year ago
Michael Neuling af1b76d944
Merge pull request #356 from antonblanchard/fpu-constant
fpu: Make inverse_table a constant
1 year ago
Michael Neuling 9b96ab730c
Merge pull request #357 from antonblanchard/xics-warning
xics: Fix warning when comparing two std_ulogic_vectors
1 year ago
Anton Blanchard 0b39947f8d Remove unused sequential signal from Fetch1ToIcacheType
GHDL synthesis is flagging a warning about this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Anton Blanchard 00bf0af21c xics: Fix warning when comparing two std_ulogic_vectors
Use unsigned() to make it clear what we are doing.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Anton Blanchard 50b4cb9423 fpu: Make inverse_table a constant
GHDL synthesis is complaining that inverse_table is never stored to.
Change it to a constant.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
1 year ago
Tianrui Wei 844ca0e6b5
fix: fix icache_tb not finishing correctly
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.

Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
1 year ago
Michael Neuling f01f3d233a
Merge pull request #352 from mkj/static-urjtag
mw_debug: Add STATIC_URJTAG flag
1 year ago
Matt Johnston c0c00d05bc mw_debug: Add STATIC_URJTAG flag
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`

Fixes #351

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Michael Neuling ffcdaaa92d
Update the README Issues (#350)
We've had these for a while now:
 - D/I cache
 - GPR bypassing
 - Supervisor state (and can boot linux)

We still need Vector/VMX/VSX (and probably some other things)

Signed-off-by: Michael Neuling <mikey@neuling.org>
1 year ago
Michael Neuling b4770197a2
Merge pull request #349 from madscientist159/master
Extend LiteDRAM VHDL wrapper to allow more than one clock line
1 year ago
Raptor Engineering Development Team fcb783a0fb Extend LiteDRAM VHDL wrapper to allow more than one clock line
This is necessary for the upcoming Arctic Tern system enablement,
since Arctic Tern uses two DRAM devices and a separate clock line
is routed to each device.  LiteX handles this behavior correctly,
therefore we assume other hardware exists that uses a similar
DRAM clock design.

Updates from Mikey to fix some compile issues.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
1 year ago
Michael Neuling 2b97fb0bf3
Merge pull request #348 from paulusmack/reduce
Reduce LUT usage
1 year ago
Paul Mackerras 0aa898c7a6 xics: Rework the irq_gen process
At present, the loop in the irq_gen process generates a chain of
comparators and other logic to work out the source number and priority
of the most-favoured (lowest priority number) pending interrupt.
This replaces that chain with (1) logic to generate an array of bits,
one per priority, indicating whether any interrupt is pending at that
priority, (2) a priority encoder to select the most favoured priority
with an interrupt pending, (3) logic to generate an array of bits, one
per source, indicating whether an interrupt is pending at the priority
calculated in step 2, and (4) a priority encoder to work out the
lowest numbered source that has an interrupt pending at the selected
priority.  This reduces LUT utilization.

The priority encoder function implemented here uses the optimized
count-leading-zeroes logic from helpers.vhdl.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras 1720a0584a Use alternative count-leading-zeroes algorithm in the FPU and LSU
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras 1086988883 countzero: Use alternative algorithm for higher bits
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.

By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB.  This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits.  Bit 4 of the result requires
looking at bits 63, 47, 31 and 15.  In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.

It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5.  This saves almost 60 6-input LUTs on the Artix-7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras 4cf2921b0b soc: Re-do peripheral address decode to improve timing
This generates a series of io_cycle_* signals which are clean latches
and which become the 'cyc' signals of the wishbone buses going to
various peripherals (syscon, uarts, XICS, GPIO, etc.).  Effectively
this is done by moving the address decoding into the slave_io_latch
process.  The slave_io_type, which drives the multiplexer which
selects which wishbone to look for a response on, is reduced to just 8
values in the expectation that an 8-way multiplexer will use less
logic than one with more than 8 inputs.

With this timing is considerably better on the A7-100T.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Michael Neuling 27b660ef76
Merge pull request #346 from mkj/dmi_ecp5
Add DMI and mw_debug for ECP5
1 year ago
Anton Blanchard 5a5a082601
Merge pull request #343 from mikey/orange-crab-ci
ci: Add new Orange Crab build
1 year ago
Matt Johnston 9c64f8a98b mw_debug: Add Lattice ECP5 support
"-b ecp5" will select ECP5 interface that talks to a JTAGG
primitive.

For example with a FT232H JTAG board:

./mw_debug  -t 'ft2232 vid=0x0403 pid=0x6014'  -s 30000000 -b ecp5 mr ff003888 6
Connected to libftdi driver.
Found device ID: 0x41113043
00000000ff003888: 6d6f636c65570a0a  ..Welcom
00000000ff003890: 63694d206f742065  e to Mic
00000000ff003898: 2120747461776f72  rowatt !
00000000ff0038a0: 0000000000000a0a  ........
00000000ff0038a8: 67697320636f5320   Soc sig
00000000ff0038b0: 203a65727574616e  nature:
Core: running
 NIA: c0000000000187f8
 MSR: 9000000000001033

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 3775650df3 dmi_dtm_ecp5: Use ECP5 JTAGG for DMI
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087

Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston eb20195a10 mw_debug: Link urjtag statically
liburjtag isn't in Debian, so usually we're pointing at a urjtag
build directory when building mw_debug

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 763138798e mw_debug: use isxdigit for hex arguments
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 04cc4a842c mw_debug: Add -s frequency argument
Chose -s for speed, vs -f for --force

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston e05ae0c8cb mw_debug: pass target parameters to urjtag
An example

./mw_debug -d -t 'ft2232 vid=0x0403 pid=0x6014'

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Paul Mackerras 49ec80ac3e fetch1/icache1: Remove the use_previous logic
This removes logic that I added some time ago with the thought that it
would enable us to do prefetching in the icache.  This logic detects
when the fetch address is an odd multiple of 4 and the next address in
sequence from the previous cycle.  In that case the instruction we
want is in the output register of the icache RAM already so there is
no need to do another read or any icache tag or TLB lookup.

However, this logic adds complexity, and removing it improves timing,
so this removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Paul Mackerras cef3660e74
Merge pull request #345 from antonblanchard/popcnt-go-fast
popcnt* timing improvements from Paul
1 year ago
Paul Mackerras 2491aa7fc5 core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result.  The motivation for this is that the popcnt*
calculation was showing up as a critical path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
Michael Neuling 286757f0f7 ci: Add new Orange Crab build
This builds the Orange Crab v0.21 + litedram image

Signed-off-by: Michael Neuling <mikey@neuling.org>
1 year ago
Michael Neuling 6ff3b2499c
Merge pull request #342 from mkj/orangecrab-merge
Orangecrab working with litedram

Fixed up a few simple merge conflicts in the Makefile.
1 year ago
Michael Neuling cdd661d844
Merge branch 'master' into orangecrab-merge 1 year ago
Michael Neuling fda8879e2f
Merge pull request #341 from mkj/progtools
orangecrab programming targets
1 year ago
Michael Neuling ffbf2f9964
Merge pull request #340 from mkj/orangecrab-ghdl-plugin
Makefile: detect when ghdl is a yosys plugin
1 year ago
Matt Johnston 049f0549d8 orangecrab: Fix sdcard wishbone addressing
Orangecrab missed out on:

Make wishbone addresses be in units of doublewords or words
Author: Paul Mackerras <paulus@ozlabs.org>
Date:   Wed Sep 15 18:18:09 2021 +1000

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston abc6a4f372 orangecrab: use litesdcard
Currently not working (tested in Linux)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 42959184dd litesdcard: add lattice, regenerate
Modifies litescard generate script to take a clock speed.

Regenerated verilog with latest litesdcard
e52c731 ("Bump year.")

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston d794cc70b1 orangecrab: No BTC, LOG_LENGTH, dram NUM_LINES
Reduce litedram NUM_LINES 64->8
This allows us to meet timing. Can probably
be improved in future with better BRAM usage.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston a8d9203c5d orangecrab: Use litedram
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 57d4c4c117 orangecrab: set HAS_SHORT_MULT
It seems free, generated as a single MULT18X18D

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston a9b467f43b orangecrab: add Orange Crab r0.2 target
top-orangecrab0.2 is a copy of top-arty with various changes.
USRMCLK is added for the SPI clock
ethernet is removed

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 8901e84d8d litedram: Add orangecrab-85-0.2 target
Parameters are based on
https://github.com/gregdavill/OrangeCrab-test-sw/blob/main/hw/OrangeCrab-bitstream.py
and litex-boards orangecrab.py

rtt_nom and cmd_delay are overridden for OrangeCrab, we do the same here.

Generated with litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 08021ae28e litedram: set Makefile -Werror
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 5a3cdc8b22 litedram: disable block_until_ready, regenerate
Recent litedram gets stuck at memtest unless block_until_ready=False.
(discussion in https://github.com/enjoy-digital/litedram/pull/292)

This change regenerates with latest litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 5e90133b61 Makefile: add ecpprog targets
The 0x80000 offset is specific to the OrangeCrab bootloader.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 7761bf8b71 Makefile: Add DFU programming
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Matt Johnston 2ec0d5fccd Makefile: detect when ghdl is a yosys plugin
oss-cad-suite builds it as a plugin, some other toolchains
have it built in.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
1 year ago
Anton Blanchard 67164a6ffa
Merge pull request #338 from shenki/yosys-read-verilog
Makefile: Use read_verilog with yosys
1 year ago
Joel Stanley 9ceb463957 Makefile: Use read_verilog with yosys
Yosys changed command line behaviour following the v0.12 release.  Work
around this by using read_verilog, which maintains the old behaviour.

This should work fine for current yosys and be compatible with
future releases.

See https://github.com/YosysHQ/yosys/issues/3109

Signed-off-by: Joel Stanley <joel@jms.id.au>
1 year ago
Michael Neuling 7fa7b45faa
Merge pull request #337 from paulusmack/fixes
ECP5: Adjust PLL constants so the PLL lock indication works
2 years ago
Paul Mackerras d458b5845c ECP5: Adjust PLL constants so the PLL lock indication works
At present, code (such as simple_random) which produces serial port
output during the first few milliseconds of operation produces garbled
output.  The reason is that the clock has not yet stabilized and is
running slow, resulting in the bit time of the serial characters being
too long.

The ECP5 data sheet says that the phase detector should be operated
between 10 and 400 MHz.  The current code operates it at 2MHz.
Consequently, the PLL lock indication doesn't work, i.e. it is always
zero.  The current code works around that by inverting it, i.e. taking
the "not locked" indication to mean "locked".

Instead, we now run it at 12MHz, chosen because the common external
clock inputs on ECP5 boards are 12MHz and 48MHz.  Normally this would
mean that the available system clock frequencies would be multiples of
12MHz, but this is a little inconvenient as we use 40MHz on the Orange
Crab v0.21 boards.  Instead, by using the secondary clock output for
feedback, we can have any divisor of the PLL frequency as the system
clock frequency.

The ECP5 data sheet says the PLL oscillator can run at 400 to 800
MHz.  Here we choose 480MHz since that allows us to generate 40MHz and
48MHz easily and is a multiple of 12MHz.

With this, the lock signal works correctly, and the inversion can be
removed.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 8a030502a2
Merge pull request #336 from paulusmack/fixes
Makefile: Correct parameters for the Orange Crab 85F
2 years ago
Paul Mackerras a5c9b3c412 Makefile: Add a target for the Orange Crab v0.21 with LFE5U-85F
The existing orange crab target is for an older board with a
LFE5UM5G-85F device.  Newer orange crab boards (v0.21) have a
LFE5U-85F device in the -8 speed grade, so make a new target for them
called ORANGE-CRAB-0.21.

Also add flags to ecppack to indicate that the bitstream should be
compressed and can be loaded at 38.8MHz.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 9cbe1f4a17
Merge pull request #334 from antonblanchard/icbi-issue
Add a test for icbi and dcbz issues
2 years ago
Anton Blanchard 099862bee9
Merge pull request #335 from ozbenh/misc
Misc cleanups and icache fix
2 years ago
Benjamin Herrenschmidt e675eba0df icache: req_laddr becomes req_raddr
Uses real_addr_t and only stores the real address bits

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt 5cfa65e836 Introduce addr_to_wb() and wb_to_addr() helpers
These convert addresses to/from wishbone addresses, and use them
in parts of the caches, in order to make the code a bit more readable.

Along the way, rename some functions in the caches to make it a bit
clearer what they operate on and fix a bug in the icache STOP_RELOAD state where
the wb address wasn't properly converted.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt d745995207 Introduce real_addr_t and addr_to_real()
This moves REAL_ADDR_BITS out of the caches and defines a real_addr_t
type for a real address, along with a addr_to_real() conversion helper.

It makes the vhdl a bit more readable

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Anton Blanchard 2d142a6c01 tests/misc: Add a store/dcbz test
We have a bug where an store near a dcbz can cause the dcbz to only zero
8 bytes. Add a test case for this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 00259458c7 tests/misc: Add an icbi test
We have a bug where an icbi can cause an instruction to execute twice.
Add a test case for this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 13439c76ba
Merge pull request #333 from ozbenh/wukong
Add support for  QMTech Wukong v2 board
2 years ago
Benjamin Herrenschmidt d564672a82 Regenerate litedram and liteeth
Note: There are a few patches to upstream to fix an upstream breakage
of litedram standalone generator, and fix some issues with liteeth
in the way it's used on Wukong. All these have pending pull requests.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt da0189af1e Add support for QMTech Wukong v2 board
For now only the V2 of the board (slightly different pinout)
and only the A100T variant. I also haven't added GPIOs or anything
else on the PMODs really.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt 621a0f6b28 fpga/clk_gen_plle2: Add support for 50Mhz->100Mhz
50Mhz clkin, 100Mhz sys_clk, as needed for Wukon v2

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt 4b1a413a2f Add support for more spansion flash
That's the one on the Wukong v2

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Anton Blanchard c7579d74b0
Merge pull request #332 from paulusmack/fixes
Bug fixes
2 years ago
Paul Mackerras 70270c066a dcache: Fix bug with dcbz closely following stores with the same tag
This fixes a bug where a dcbz can get incorrectly handled as an
ordinary 8-byte store if it arrives while the dcache state machine is
handling other stores with the same tag value (i.e. within the same
set-sized area of memory).  The logic that says whether to include a
new store in the current wishbone cycle didn't take into account
whether the new store was a dcbz.  This adds a "req.dcbz = '0'" factor
so that it does.  This is necessary because dcbz is handled more like
a cache line refill (but writing to memory rather than reading) than
an ordinary store.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 9b3b57710a icache: Fix icache invalidation
This fixes two bugs in the flash invalidation of the icache.

The first is that an instruction could get executed twice.  The
i-cache RAM is 2 instructions (64 bits) wide, so one read can supply
results for 2 cycles.  The fetch1 stage tells icache when the address
is equal to the address of the previous cycle plus 4, and in cases
where that is true, bit 2 of the address is 1, and the previous cycle
was a cache hit, we just use the second word of the doubleword read
from the cache RAM.  However, the cache hit/miss logic also continues
to operate, so in the case where the first word hits but the second
word misses (because of an icache invalidation or a snoop occurring in
the first cycle), we supply the instruction from the data previously
read from the icache RAM but also stall fetch1 and start a cache
reload sequence, and subsequently supply the second instruction
again.  This fixes the issue by inhibiting req_is_miss and stall_out
when use_previous is true.

The second bug is that if an icache invalidation occurs while
reloading a line, we continue to reload the line, and make it valid
when the reload finishes, even though some of the data may have been
read before the invalidation occurred.  This adds a new state
STOP_RELOAD which we go to if an invalidation happens while we are in
CLR_TAG or WAIT_ACK state.  In STOP_RELOAD state we don't request any
more reads from memory and wait for the reads we have previously
requested to be acked, and then go to IDLE state.  Data returned is
still written to the icache RAM, but that doesn't matter because the
line is invalid and is never made valid.

Note that we don't have to worry about invalidations due to snooped
writes while reloading a line, because the wishbone arbiter won't
switch to another master once it has started sending our reload
requests to memory.  Thus a store to memory will either happen before
any of our reads have got to memory, or after we have finished the
reload (in which case we will no longer be in WAIT_ACK state).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 83dea94793 decode1: Conditional trap instructions don't need to be single-issue
They can generate interrupts, but that doesn't mean they have to
single-issue.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 9aaa6d3ca3
Merge pull request #330 from antonblanchard/orange-crab-freq
Orange Crab is 48MHz not 50MHz, bump PLL frequency
2 years ago
Anton Blanchard 537e446562
Merge pull request #331 from ozbenh/misc
jtag tooling improvements & gitignore fix
2 years ago
Benjamin Herrenschmidt e6cb72fcd9 Add liteeth/build to gitignore
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt b557ec3a05 mw_debug: Default to jtag backend if unspecified
It avoids typing it all the time

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt 4bdfef9a20 mw_debug: Probe cable if unspecified
Instead of defaulting to DigilentHS1

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Benjamin Herrenschmidt 814d6914d0 flash-arty: Add cable argument
To select the cable config. Defaults to arty

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2 years ago
Anton Blanchard af6bc48d36
Merge pull request #329 from paulusmack/wb-fix
Wishbone addressing fix
2 years ago
Anton Blanchard 06266fe84a Orange Crab is 48MHz not 50MHz, bump PLL frequency
I'm not sure why I set the input frequency for the Orange Crab to 50MHz.
Since we easily make timing now, bump our output frequency to 48MHz as
well.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling 0a415410c9
Merge pull request #328 from paulusmack/shortmult
core: Add a short multiplier
2 years ago
Michael Neuling c2f5db6fca
Merge pull request #327 from paulusmack/master
loadstore1 timing improvement
2 years ago
Paul Mackerras ca4eb46aea Make wishbone addresses be in units of doublewords or words
This makes the 64-bit wishbone buses have the address expressed in
units of doublewords (64 bits), and similarly for the 32-bit buses the
address is in units of words (32 bits).  This is to comply with the
wishbone spec.  Previously the addresses on the wishbone buses were in
units of bytes regardless of the bus data width, which is not correct
and caused problems with interfacing with externally-generated logic.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 734e4c4a52 core: Add a short multiplier
This adds an optional 16 bit x 16 bit signed multiplier and uses it
for multiply instructions that return the low 64 bits of the product
(mull[dw][o] and mulli, but not maddld) when the operands are both in
the range -2^15 .. 2^15 - 1.   The "short" 16-bit multiplier produces
its result combinatorially, so a multiply that uses it executes in one
cycle.  This improves the coremark result by about 4%, since coremark
does quite a lot of multiplies and they almost all have operands that
fit into 16 bits.

The presence of the short multiplier is controlled by a generic at the
execute1, SOC, core and top levels.  For now, it defaults to off for
all platforms, and can be enabled using the --has_short_mult flag to
fusesoc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras bb5f356386 loadstore1: Make r1.req.addr not depend on l_in.valid
Some critical path reports showed r1.req.addr depending on l_in.valid,
which then depended ultimately on the dcache's r1.ls_valid.  In fact
we can update r1.req.addr (and other fields of r1.req, except for
r1.req.valid) independently of l_in.valid as long as busy = 0.
We do also need to preserve r1.req.addr0 when l_in.valid = 0, so we
pull it out of r1.req and store it separately in r1.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 2224b28c2c
Merge pull request #324 from paulusmack/master
Performance and timing improvements
2 years ago
Paul Mackerras 54b0e8b8c8 core: Predict not-taken conditional branches using BTC
This adds a bit to the BTC to store whether the corresponding branch
instruction was taken last time it was encountered.  That lets us pass
a not-taken prediction down to decode1, which for backwards direct
branches inhibits it from redirecting fetch to the target of the
branch.  This increases coremark by about 2%.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 0cdaa2778f xilinx-mult: Move some registers later in the data flow
This changes s0 to use the P register rather than the A/B/C input
registers, thus improving the timing of the multiplier output.  The
m00, m02 and m03 multipliers now use their P registers rather than the
M registers, moving the addition they do from the second cycle to the
first.

Also, the XOR that inverts the 32 LSBs is moved before the output
register.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 77d9891d2f
Merge pull request #326 from antonblanchard/dcache-nc-fix
dcache: Loads from non-cacheable PTEs load entire 64 bits
2 years ago
Anton Blanchard 39e1d10069
Merge pull request #325 from paulusmack/fixes
decode1: Fix form of isel marked as single-issue
2 years ago
Anton Blanchard b29c58f3d1 dcache: Loads from non-cacheable PTEs load entire 64 bits
A non-cacheable load should only load the data requested and no more. We
do the right thing for real mode cache inhibited storage instructions,
but when loading through a non-cacheable PTE we load the entire 64 bits
regardless of the size.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Paul Mackerras d4cfdb1bfe decode1: Fix form of isel marked as single-issue
The row in the decode table for isel with BC=0 was inadvertently left
marked as single-issue by commit 813f834012 ("Add CR hazard
detection", 2019-10-15).  Fix it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 09bd01a49e
Merge pull request #323 from paulusmack/fixes
More instruction fixes
2 years ago
Paul Mackerras 06e07c69a8 decode1: Fix maddld and maddhdu to not set CR0
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras a68921edca core: Fix mcrxrx, addpcis and bpermd
- mcrxrx put the bits in the wrong order

- addpcis was setting CR0 if the instruction bit 0 = 1, which it
  shouldn't

- bpermd was producing 0 always and additionally had the wrong bit
  numbering

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 18eb029f0a
Merge pull request #322 from paulusmack/fixes
Fix bug with load hitting two previous stores
2 years ago
Paul Mackerras ba34914465 tests/misc: Add a test for a load that hits two preceding stores
This checks that the store forwarding machinery in the dcache
correctly combines forwarded stores when they are partial stores
(i.e. only writing part of the doubleword, as for a byte store).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 0b23a5e760 dcache: Simplify data input to improve timing
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 1a9834c506 dcache: Fix bug with forwarding of stores
We have two stages of forwarding to cover the two cycles of latency
between when something is written to BRAM and when that new data can
be read from BRAM.  When the writes to BRAM result from store
instructions, the write may write only some bytes of a row (8 bytes)
and not others, so we have a mask to enable only the written bytes to
be forwarded.  However, we only forward written data from either the
first stage of forwarding or the second, not both.  So if we have
two stores in succession that write different bytes of the same row,
and then a load from the row, we will only forward the data from the
second store, and miss the data from the first store; thus the load
will get the wrong value.

To fix this, we make the decision on which forward stage to use for
each byte individually.  This results in a 4-input multiplexer feeding
r1.data_out, with its inputs being the BRAM, the wishbone, the current
write data, and the 2nd-stage forwarding register.  Each byte of the
multiplexer is separately controlled.  The code for this multiplexer
is moved to the dcache_fast_hit process since it is used for cache
hits as well as cache misses.

This also simplifies the BRAM code by ensuring that we can use the
same source for the BRAM address and way selection for writes, whether
we are writing store data or cache line refill data from memory.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras f812832ad7 dcache: Move way selection and forwarding earlier
This moves the way multiplexer for the data from the BRAM, and the
multiplexers for forwarding data from earlier stores or refills,
before a clock edge rather than after, so that now the data output
from the dcache comes from a clean latch.  To do this we remove the
extra latch on the output of the data BRAM (i.e. ADD_BUF=false) and
rearrange the logic.  The choice whether to forward or not now depends
not on way comparisons but rather on a tag comparisons, for the sake
of timing.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling a9e6263aab
Merge pull request #319 from antonblanchard/verilator-ci
Add some Verilator CI tests
2 years ago
Michael Neuling 8bbb0018b4
Merge pull request #318 from paulusmack/pmu
PMU enhancements
2 years ago
Michael Neuling 05fe709ffc
Merge pull request #320 from antonblanchard/litedram-regenerate
litedram: Regenerate from upstream litex
2 years ago
Anton Blanchard c0f7f54276 litedram: Regenerate from upstream litex
This adds Joel's sdcard feature reporting to the firmware.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard ee38a31152 ci: Add verilator tests
Now we have some verilator tests, add them to the CI.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard c81583c128 makefile: Check environment for MEMORY_SIZE/RAM_INIT_FILE
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard efb387b0d2 makefile: Add some verilator micropython tests
These are the same micropython tests we use against the ghdl
simulation.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 8acd5a5607 verilator: Specify top level module
While verilator finds the correct top level module with the current
setup, if we start adding simulation models it can get confused.

Explicitly specify the top level module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 7e2de602ee makefile: Simplify microwatt-verilator target, add Docker image
Recent versions of verilator support the --build option, allowing
us to remove a step.

Also add a Docker image for verilator.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Paul Mackerras 65c43b488b PMU: Add several more events
This implements most of the architected PMU events.  The ones missing
are mostly the ones that depend on which level of the cache hierarchy
data is fetched from.  The events implemented here, and their raw
event codes, are:

    Floating-point operation completed (100f4)
    Load completed (100fc)
    Store completed (200f0)
    Icache miss (200fc)
    ITLB miss (100f6)
    ITLB miss resolved (400fc)
    Dcache load miss (400f0)
    Dcache load miss resolved (300f8)
    Dcache store miss (300f0)
    DTLB miss (300fc)
    DTLB miss resolved (200f6)
    No instruction available and none being executed (100f8)
    Instruction dispatched (200f2, 300f2, 400f2)
    Taken branch instruction completed (200fa)
    Branch mispredicted (400f6)
    External interrupt taken (200f8)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 8cdb00652b
Merge pull request #316 from antonblanchard/verilator-fix
Rename 'do' signal to avoid verilator System Verilog warning
2 years ago
Paul Mackerras 71af8016da
Merge pull request #317 from antonblanchard/gpio-fix
gpio: Add HAS_GPIO to avoid verilator build errors
2 years ago
Paul Mackerras e33fb26e7a PMU: Fix PMC5/6 behaviour when MMCR0[PMCC] = 11
The architecture states that when MMCR0[PMCC] = 0b11, PMC5 and PMC6
are not part of the Performance Monitor, meaning that they are not
controlled by bits in MMCRs, and counter negative conditions in PMCs 5
and 6 don't generate Performance Monitor alerts, exceptions or
interrupts.  It doesn't say that PMC5 and PMC6 are frozen in this
case, so presumably they should continue to count run instructions and
run cycles.

This implements that behaviour.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Anton Blanchard 591e96d1a2 gpio: Add HAS_GPIO to avoid verilator build errors
The verilator build fails with warnings and errors, because NGPIO
is 0 and we do things like:

        gpio_out : out std_ulogic_vector(NGPIO - 1 downto 0);

Set NGPIO to something reasonable (eg 32) and add HAS_GPIO to avoid
building the macro entirely if it isn't in use.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard bc0f7cf236 Rename 'do' signal to avoid verilator System Verilog warning
Experimenting with using ghdl to do VHDL to Verilog conversion (instead
of ghdl+yosys), verilator complains that a signal is a SystemVerilog
keyword:

%Error: microwatt.v:15013:18: Unexpected 'do': 'do' is a SystemVerilog keyword misused as an identifier.
        ... Suggest modify the Verilog-2001 code to avoid SV keywords, or use `begin_keywords or --language.

We could probably make this go away by disabling SystemVerilog, but
it's easy to rename the signal in question. Rename di at the same
time.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling 2bd00f5119
Merge pull request #315 from paulusmack/pmu
Add basic PMU implementation
2 years ago
Paul Mackerras 1896e5f803
Merge pull request #314 from antonblanchard/yosys-go-fast-bits
Reduce Yosys ECP5 cell usage by 30% with -abc9 -nowidelut
2 years ago
Michael Neuling 400e481ffa
Merge pull request #313 from paulusmack/fixes
Fix bug causing FP unavailable interrupts to be missed
2 years ago
Paul Mackerras a7873b45f7 core: Add a basic performance monitor unit (PMU) implementation
This is the start of an implementation of a PMU according to PowerISA
v3.0B.  Things not implemented yet include most architected events,
the BHRB, event-based branches, thresholding, MMCR0[TBCC] field, etc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Anton Blanchard 6254bb5ee9 Reduce Yosys ECP5 cell usage by 30% with -abc9 -nowidelut
We've been investigating why the barrel rotator uses an enormous
number of cells on the yosys ECP5 target. Eventually it was narrowed
down to the -abc9 -nowidelut options, which see the cell count go from
4985 cells to 841 cells.

Using the same options on an Orange Crab build reduces the cell count
from 50864 to 36085. The main differences:

     LUT4                        31040 -> 25270
     PFUMX                        6956 ->     0
     L6MUX21                      1746 ->     0
     CCU2C                        2066 ->  1759

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling aa4e4e77c4
Merge pull request #311 from antonblanchard/litesdcard-nexys-video
Update litesdcard from upstream and add Nexys Video support
2 years ago
Michael Neuling 65c131e89f
Merge pull request #312 from shenki/sdcard-soc-features
litedram: Add sdcard to soc features
2 years ago
Paul Mackerras f40842d9b2 tests/fpu: Test FPU unavailable interrupt following a load
This adds a load before a floating-point load which should generate a
floating-point unavailable interrupt, to test for the bug where
unavailability interrupts can get dropped while loadstore1 is
executing instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Joel Stanley bc3995804f litedram: Add sdcard to soc features
Signed-off-by: Joel Stanley <joel@jms.id.au>
2 years ago
Paul Mackerras 64e3ce7134 execute1: Handle interrupts during sequences of load/store operations
At present the logic prevents any interrupts from being handled while
there is a load/store instruction (one that has unit=LDST) being
executed.  However, load/store instructions can still get sent to
loadstore1.  Thus an instruction which should generate an interrupt
such as a floating-point unavailable interrupt will instead get
executed.

To fix this, when we detect that an interrupt should be generated but
loadstore1 is still executing a previous instruction, we don't execute
any new instructions, and set a new r.intr_pending flag.  That results
in busy_out being asserted (meaning that no further instructions will
come in from decode2).  When loadstore1 has finished the instructions
it has, the interrupt gets sent to writeback.  If one of the
instructions in loadstore1 generates an interrupt in the meantime, the
l_in.interrupt signal gets asserted and that clears r.intr_pending, so
the interrupt we detected gets discarded.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Anton Blanchard 7cfbcd5514 litesdcard: Add Nexys Video support
This board has a reset line that needs to be held low to power up the
SD card hardware.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 9caaa3fc46 litesdcard: Use vendor not board type
litesdcard provides a macro per vendor (eg xilinx, lattice) and not per
board, so modify the fusesoc generator to take a vendor. This will make
it easier to add litesdcard to more boards.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Paul Mackerras c198b2b82e
Merge pull request #310 from antonblanchard/liteeth-update-2
Update liteeth from upstream and add Nexys Video support
2 years ago
Anton Blanchard 34e10cc52c liteeth: Regenerate from upstream litex
Unfortunately the CSR layout has shifted on upstream litex, so this
is built with the following litex patch backed out:

aad56a047a33 ("integration/soc: Use CSR automatic allocation.")

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 12efb51bcc liteeth: Update yaml config
csr_data_width is no longer required. Add ntxslots and nrxslots
parameters but set them to the default value.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 458dfe01a6 Add liteeth support to Nexys Video
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling cf6df4f17f
Merge pull request #307 from antonblanchard/litedram-update
Updating to latest upstream litedram and some cleanups to associated scripts
2 years ago
Michael Neuling 69a1440204
Merge pull request #309 from antonblanchard/clk-cleanup
Small cleanups to clock definitions
2 years ago
Michael Neuling b885ee7ed1
Merge pull request #308 from antonblanchard/small-fixes
Fix some whitespace issues
2 years ago
Anton Blanchard 75e06a1e30 Remove -add from xdc files
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 187199c489 Remove -waveform from xdc files
A 50% duty cycle is the default, so no need to use -waveform.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 7994b98404 Fix some whitespace issues
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 46cde3bb23
Merge pull request #305 from mikey/noflatten
ci: Remove noflatten to reduce size of ECP5 builds
2 years ago
Anton Blanchard 780d6c754c litedram: Regenerate from upstream litex
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 07f2edc415 litedram: sdrinit() is now sdram_init()
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 346686feb8 litedram: Fix compiler warning
define MAIN_RAM_BASE and MAIN_RAM_SIZE as unsigned long

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard ac546a3024 litedram: Update yaml files
Update the litedram yaml files based on latest upstream.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 6034a9e31f litedram: simplify generate.py
We can call litedram_gen instead of doing the work ourselves.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 3275304a7f litedram: Remove variables.mak
Instead of creating variables.mak, just pass the variables in on
the make command line.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling d6efbb327f ci: Remove noflatten to reduce size of ECP5 builds
This option was added in the commit but is no longer needed for github
CI to work.

    commit ef0dcf3bc6
    Author: Michael Neuling <mikey@neuling.org>
    Date:   Thu Jul 2 14:36:14 2020 +1000
    Add SYNTH_ECP5_FLAGS option for building

Removing noflatten has the added advantage that it gets our builds
from 75% down to 59% usage on ECP5 85K.

Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago
Anton Blanchard 0199ff8ca8
Merge pull request #299 from mikey/vunit-make
makefile: Add check_vunit
2 years ago
Michael Neuling 25ab1053e9
Merge pull request #304 from umarcor/ci-backends
ci: test 'build' with LLVM and GCC backends
2 years ago
umarcor de41dfc703 ci: test 'build' with LLVM and GCC backends
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
2 years ago
Michael Neuling 0cd826d190
Merge pull request #301 from umarcor/vunit-cleanup
VUnit cleanup
2 years ago
Michael Neuling bf76261979 makefile: Add check_vunit
Allow newly added vuint run script to be run via make. Also integrate
with DOCKER/PODMAN=1.

Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago
umarcor 178c2a7da3 VUnit: style
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
2 years ago
umarcor 2031c6d2d2 VUnit: use Path.glob instead of glob.glob
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
2 years ago
umarcor 7571416f81 ci: add 'workflow_dispatch'
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
2 years ago
umarcor faf8309629 ci: in job 'VUnit' use a container step instead of a container job
Signed-off-by: umarcor <unai.martinezcorral@ehu.eus>
2 years ago
Michael Neuling d7458d5beb
Reduce the size of icache to help yosys ECP5 builds (#303)
The icache RAM is currently LUT ram not block ram. This massively
bloats the icache size. We think this is due to yosys not inferencing
the RAM correctly but that's yet to be confirmed.

Work around this for now by reducing the default size of the icache
RAM for the ECP5 builds.

On the ECP5 85K builts, this gets us from 95% down to 76% and helps
our CI to pass.

Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago
Michael Neuling f9654428ff
Merge pull request #296 from LarsAsplund/logging-checking
Replaced VHDL assert and report with VUnit checking and logging
2 years ago
Michael Neuling 9e3c756234
Merge pull request #298 from paulusmack/master
MMU: Implement a vestigial partition table
2 years ago
Michael Neuling ff7421c54e
Merge pull request #295 from LarsAsplund/master
Run VHDL tests with VUnit
2 years ago
Paul Mackerras 18120f153d MMU: Implement a vestigial partition table
This implements a 1-entry partition table, so that instead of getting
the process table base address from the PRTBL SPR, the MMU now reads
the doubleword pointed to by the PTCR register plus 8 to get the
process table base address.  The partition table entry is cached.

Having the PTCR and the vestigial partition table reduces the amount
of software change required in Linux for Microwatt support.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Lars Asplund 478b787c10 Replaced VHDL assert and report with VUnit checking and logging
The VUnit log package is a SW style logging framework in VHDL and the check package is an assertion library doing its error reporting with VUnit logging.
These testbenches don't use, and do not need, very advanced logging/checking features but the following was possible to improve

- Checking equality in VHDL can be quite tedious with a lot of type conversions and long message strings to explain the data received and what was expected.
  VUnit's check_equal procedure allow comparison between same or similar types and automatically create the error message for you.
- The code has report statements used for testbench progress reporting and debugging. These were replaced with the info and debug procedures.
  info logs are visible by default while debug is not. This means that debug logs don't have to be commented, which they are now, when not used.
  Instead there is a show procedure making debug messages visible. The show procedure has been commented to hide the debug messages but a more elegant
  solution is to control visibility from a generic and then set that generic from the command line. I've left this as a TODO but the run script allow you to
  extend the standard CLI of VUnit to add new options and you can also set generics from the run script.
- VUnit log messages are color coded if color codes are supported by the terminal. It makes it quicker to spot messages of different types when there are many log messages.
  Error messages will always be made visible on the terminal but you must use the -v (verbose) to see other logs.
- Some tests have a lot of "metvalue detected" warning messages from the numeric_std package and these clutter the logs when using the -v option. VUnit has a simulator independent
  option allowing you to suppress those messages. That option has been enabled.

Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
2 years ago
Lars Asplund 0865704e21 Run VUnit tests in CI
Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
2 years ago
Lars Asplund 0940b8a9d3 Organized VUnit testbenches into test cases.
Several of the testbenches have stimuli code divided into sections preceded with a header comment explaining
what is being tested. These sections have been made into VUnit test cases. The default behavior of VUnit is
to run each test case in a separate simulation which comes with a number of benefits:

* A failing test case doesn't prevent other test cases to be executed
* Test cases are independent. A test case cannot fail as a side-effect to a problem with another test case
* Test execution can be more parallelized and the overall test execution time reduced

Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
2 years ago
Lars Asplund 08c0c4c1b4 Make core testbenches recognized by VUnit
This commit also removes the dependencies these testbenches have on VHPIDIRECT.
The use of VHPIDIRECT limits the number of available simulators for the project. Rather than using
foreign functions the testbenches can be implemented entirely in VHDL where equivalent functionality exists.
For these testbenches the VHPIDIRECT-based randomization functions were replaced with VHDL-based functions.

The testbenches recognized by VUnit can be executed in parallel threads for better simulation performance using
the -p option to the run.py script

Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
2 years ago
Lars Asplund 41d57e6148 Added VUnit run script.
The VUnit run script will find all VHDL files based on given search patterns, figure out their dependencies, and support incremental compile based on the dependencies.
The same script is used for all VUnit supported simulators. Supporting several simulators simplifies the adoption of this project.

At this point only compilation is performed. Coming commits will enable simulation of VHDL testbenches.

Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>
2 years ago
Michael Neuling 84473eda1b Merge pull request #277 from paulus/gpio
A few cleanups. GPIO IRQ number is now 4 as 3 is now taken by the SD card.
2 years ago
Michael Neuling 7f44980611
Merge pull request #287 from paulusmack/master
Add SD card interface
2 years ago
Paul Mackerras d8ea64675a
Merge pull request #278 from shenki/openocd-v0.11
Add files for openocd v0.11
2 years ago
Joel Stanley 0d4a0bab6e openocd: Fix verify command for v0.10
v0.11 uses verify_image, which is not supported by v0.11. Use the old
verify_bank for v0.10.

Signed-off-by: Joel Stanley <joel@jms.id.au>
2 years ago
Paul Mackerras 21ed730514 arty_a7: Add litesdcard interface
This adds litesdcard.v generated from the litex/litesdcard project,
along with logic in top-arty.vhdl to connect it into the system.
There is now a DMA wishbone coming in to soc.vhdl which is narrower
than the other wishbone masters (it has 32-bit data rather than
64-bit) so there is a widening/narrowing adapter between it and the
main wishbone master arbiter.

Also, litesdcard generates a non-pipelined wishbone for its DMA
connection, which needs to be converted to a pipelined wishbone.  We
have a latch on both the incoming and outgoing sides of the wishbone
in order to help make timing (at the cost of two extra cycles of
latency).

litesdcard generates an interrupt signal which is wired up to input 3
of the ICS (IRQ 19).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 231003f7c7 icache: Snoop writes to memory by other agents
This makes the icache snoop writes to memory in the same way that the
dcache does, thus making DMA cache-coherent for the icache as well as
the dcache.

This also simplifies the logic for the WAIT_ACK state by removing the
stbs_done variable, since is_last_row(r.store_row, r.end_row_ix) can
only be true when stbs_done is true.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 4c11c9c661 dcache: Simplify logic in RELOAD_WAIT_ACK state
Since the expression is_last_row(r1.store_row, r1.end_row_ix) can only
be true when stbs_done is true, there is no need to include stbs_done
in the expression for the reload being completed, and hence no need to
compute stbs_done in the RELOAD_WAIT_ACK state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras eb7eba2d92 dcache: Snoop writes to memory by other agents
This adds a path where the wishbone that goes out to memory and I/O
also gets fed back to the dcache, which looks for writes that it
didn't initiate, and invalidates any cache line that gets written to.

This involves a second read port on the cache tag RAM for looking up
the snooped writes, and effectively a second write port on the cache
valid bit array to clear bits corresponding to snoop hits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 4a8ab3331c
Merge pull request #283 from antonblanchard/whitespace
Fix a few whitespace issues.  Only changes whitespace and comments.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Antony Vennard d9a398dc81
Update documentation. (#280)
Update documentation to reference fusesoc init for Xilinx boards, for
those like me who have never used fusesoc before. Add a reference to the
board files for Digilent boards and comment on perhaps installing them
for other boards as appropriate.

Signed-off-by: Antony Vennard <antony@vennard.ch>
2 years ago
Anton Blanchard 6d827b9358
Merge pull request #286 from antonblanchard/Makefile-cleanup-3
A few more Makefile cleanups
2 years ago
Anton Blanchard be11ebbf6d Remove unused GHDL_TARGET_GENERICS
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 33c78f9282 Move verilator --trace flag into VERILATOR_FLAGS
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 0af906232f
Merge pull request #285 from antonblanchard/Makefile-cleanup-2
A few Makefile cleanups
2 years ago
Anton Blanchard 5cc5d8f030
Merge pull request #281 from antonblanchard/cache-tlb-parameters
Pass icache/dcache/tlb parameters down from soc
2 years ago
Anton Blanchard 4ab36517ec Remove -frelaxed
We don't appear to need this any more, so remove it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 561d6af6f0 Use VERILATOR_FLAGS/VERILATOR_CFLAGS on all verilator targets
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 75da4156fe Remove core_files from soc_files and fpga_files
We were already including the core_files at the same time as the
soc_files in many targets.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard ef01fa32bd
Merge pull request #284 from antonblanchard/boot-clocks
Allow SPI BOOT_CLOCKS to be overridden by top level
2 years ago
Anton Blanchard af462f0ca9 Reformat spi_flash_ctrl
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 2db89628ab Reformat control
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 21f482f967 Reformat testbenches
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 0d86580ac7 Reformat writeback
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard f67b143165 Reformat plru
Also fix a typo

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard c76e638a77 Reformat rotator
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 601f3211be Reformat divider
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard bf96279ff1 Reformat countzero
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard af7e330d69 Reformat cr_file
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 9208276aa2 Reformat register_file
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 74254bf11a Reformat cache_ram
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 91a53d8001 Allow SPI BOOT_CLOCKS to be overridden by top level
Our SPI controller sends 8 dummy clocks at boot which Ben
added for some Xilinx boards. This should be harmless but
it is confusing the flash testbench in the Caravel project.

Add a parameter so it can be overridden at the top level.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Paul Mackerras f06a0f4e5a arty: Update GPIOs for Boxarty BMC
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Joel Stanley 24a34899b4 Add files for openocd v0.11
The protocol used by the spi bridge firmware changed as of openocd
v0.11. As this is the version packaged by Debian Bullseye, add the
firmware for convince.

Signed-off-by: Joel Stanley <joel@jms.id.au>
2 years ago
Paul Mackerras f06ffcf9b7 Add a GPIO controller and use it to drive the shield I/O pins on the Arty
This adds a GPIO controller which provides 32 bits of I/O.  The
registers are modelled on the set used by the gpio-ftgpio010.c driver
in the Linux kernel.  Currently there is no interrupt capability
implemented, though an interrupt line from the GPIO subsystem to the
XICS has been connected.

For the Arty A7 board, GPIO lines 0 to 13 are connected to the pins
labelled IO0 to IO13 on the "shield" connector, GPIO lines 14 to 29
connect to IO26 to IO41, GPIO line 30 connects to the pin labelled A
(aka IO42), and GPIO line 31 is connected to LED 7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago

@ -5,12 +5,19 @@ on:
pull_request:
schedule:
- cron: '0 0 * * 5'
workflow_dispatch:

jobs:

build:
runs-on: ubuntu-latest
container: ghdl/vunit:llvm
strategy:
fail-fast: false
matrix:
backend:
- llvm
- gcc
container: ghdl/vunit:${{ matrix.backend }}
steps:
- uses: actions/checkout@v2
- run: make GNATMAKE='gnatmake -j'$(nproc)
@ -33,7 +40,6 @@ jobs:
max-parallel: 3
matrix:
task: [
"tests_unit",
"tests_console",
"{1..99}",
"{100..199}",
@ -52,16 +58,24 @@ jobs:
- uses: actions/checkout@v2
- run: bash -c "make -j$(nproc) ${{ matrix.task }}"

VUnit:
needs: [build]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: docker://ghdl/vunit:llvm
with:
args: python3 ./run.py -p10

symbiflow:
strategy:
fail-fast: false
max-parallel: 2
matrix:
task: [ ECP5-EVN, ORANGE-CRAB ]
task: [ ECP5-EVN, ORANGE-CRAB, ORANGE-CRAB-0.21 ]
runs-on: ubuntu-latest
env:
DOCKER: 1
SYNTH_ECP5_FLAGS: -noflatten
FPGA_TARGET: ${{matrix.task}}
steps:
- uses: actions/checkout@v2
@ -79,3 +93,17 @@ jobs:
steps:
- uses: actions/checkout@v2
- run: make DOCKER=1 microwatt.v

verilator:
runs-on: ubuntu-latest
env:
DOCKER: 1
FPGA_TARGET: verilator
RAM_INIT_FILE: micropython/firmware.hex
MEMORY_SIZE: 524288
steps:
- uses: actions/checkout@v2
- run: |
sudo apt update
sudo apt install -y python3-pexpect
make -j$(nproc) test_micropython_verilator test_micropython_verilator_long

1
.gitignore vendored

@ -13,4 +13,5 @@ tests/*/*.hex
tests/*/*.elf
TAGS
litedram/build/*
liteeth/build/*
obj_dir/*

@ -1,12 +1,22 @@
GHDL ?= ghdl
GHDLFLAGS=--std=08 -frelaxed
GHDLFLAGS=--std=08
CFLAGS=-O3 -Wall
# Need to investigate why yosys is hitting verilator warnings, and eventually turn on -Wall
VERILATOR_FLAGS=-O3 -Wno-fatal -Wno-CASEOVERLAP -Wno-UNOPTFLAT #--trace
# It takes forever to build with optimisation, so disable by default
#VERILATOR_CFLAGS=-O3

GHDLSYNTH ?= ghdl.so
# some yosys builds have ghdl plugin built in, otherwise need "-m ghdl"
GHDLSYNTH ?= $(shell ($(YOSYS) -H | grep -q ghdl) || echo -m ghdl)
YOSYS ?= yosys
NEXTPNR ?= nextpnr-ecp5
ECPPACK ?= ecppack
ECPPROG ?= ecpprog
OPENOCD ?= openocd
VUNITRUN ?= python3 ./run.py
VERILATOR ?= verilator
DFUUTIL ?= dfu-util
DFUSUFFIX ?= dfu-suffix

# We need a version of GHDL built with either the LLVM or gcc backend.
# Fedora provides this, but other distros may not. Another option is to use
@ -29,15 +39,19 @@ PWD = $(shell pwd)
DOCKERARGS = run --rm -v $(PWD):/src:z -w /src
GHDL = $(DOCKERBIN) $(DOCKERARGS) ghdl/ghdl:buster-llvm-7 ghdl
CC = $(DOCKERBIN) $(DOCKERARGS) ghdl/ghdl:buster-llvm-7 gcc
GHDLSYNTH = ghdl
GHDLSYNTH = -m ghdl
YOSYS = $(DOCKERBIN) $(DOCKERARGS) hdlc/ghdl:yosys yosys
NEXTPNR = $(DOCKERBIN) $(DOCKERARGS) hdlc/nextpnr:ecp5 nextpnr-ecp5
ECPPACK = $(DOCKERBIN) $(DOCKERARGS) hdlc/prjtrellis ecppack
OPENOCD = $(DOCKERBIN) $(DOCKERARGS) --device /dev/bus/usb hdlc/prog openocd
VUNITRUN = $(DOCKERBIN) $(DOCKERARGS) ghdl/vunit:llvm python3 ./run.py
VERILATOR = $(DOCKERBIN) $(DOCKERARGS) verilator/verilator:latest
endif

all = core_tb icache_tb dcache_tb multiply_tb dmi_dtm_tb divider_tb \
rotator_tb countzero_tb wishbone_bram_tb soc_reset_tb
VUNITARGS += -p10

all = core_tb icache_tb dcache_tb dmi_dtm_tb \
wishbone_bram_tb soc_reset_tb

all: $(all)

@ -46,20 +60,20 @@ core_files = decode_types.vhdl common.vhdl wishbone_types.vhdl fetch1.vhdl \
decode1.vhdl helpers.vhdl insn_helpers.vhdl \
control.vhdl decode2.vhdl register_file.vhdl \
cr_file.vhdl crhelpers.vhdl ppc_fx_insns.vhdl rotator.vhdl \
logical.vhdl countzero.vhdl multiply.vhdl divider.vhdl execute1.vhdl \
logical.vhdl countbits.vhdl multiply.vhdl divider.vhdl execute1.vhdl \
loadstore1.vhdl mmu.vhdl dcache.vhdl writeback.vhdl core_debug.vhdl \
core.vhdl fpu.vhdl
core.vhdl fpu.vhdl pmu.vhdl

soc_files = $(core_files) wishbone_arbiter.vhdl wishbone_bram_wrapper.vhdl sync_fifo.vhdl \
wishbone_debug_master.vhdl xics.vhdl syscon.vhdl soc.vhdl \
soc_files = wishbone_arbiter.vhdl wishbone_bram_wrapper.vhdl sync_fifo.vhdl \
wishbone_debug_master.vhdl xics.vhdl syscon.vhdl gpio.vhdl soc.vhdl \
spi_rxtx.vhdl spi_flash_ctrl.vhdl

uart_files = $(wildcard uart16550/*.v)

soc_sim_files = $(soc_files) sim_console.vhdl sim_pp_uart.vhdl sim_bram_helpers.vhdl \
soc_sim_files = $(core_files) $(soc_files) sim_console.vhdl sim_pp_uart.vhdl sim_bram_helpers.vhdl \
sim_bram.vhdl sim_jtag_socket.vhdl sim_jtag.vhdl dmi_dtm_xilinx.vhdl \
sim_16550_uart.vhdl \
random.vhdl glibc_random.vhdl glibc_random_helpers.vhdl
foreign_random.vhdl glibc_random.vhdl glibc_random_helpers.vhdl

soc_sim_c_files = sim_vhpi_c.c sim_bram_helpers_c.c sim_console_c.c \
sim_jtag_socket_c.c
@ -76,7 +90,6 @@ $(unisim_lib): $(unisim_lib_files)
$(GHDL) -i --std=08 --work=unisim --workdir=$(unisim_dir) $^
GHDLFLAGS += -P$(unisim_dir)

core_tbs = multiply_tb divider_tb rotator_tb countzero_tb
soc_tbs = core_tb icache_tb dcache_tb dmi_dtm_tb wishbone_bram_tb
soc_flash_tbs = core_flash_tb
soc_dram_tbs = dram_tb core_dram_tb
@ -102,9 +115,6 @@ $(soc_flash_tbs): %: $(soc_sim_files) $(soc_sim_obj_files) $(unisim_lib) $(fmf_l
$(soc_tbs): %: $(soc_sim_files) $(soc_sim_obj_files) $(unisim_lib) %.vhdl
$(GHDL) -c $(GHDLFLAGS) $(soc_sim_link) $(soc_sim_files) $@.vhdl -e $@

$(core_tbs): %: $(core_files) glibc_random.vhdl glibc_random_helpers.vhdl %.vhdl
$(GHDL) -c $(GHDLFLAGS) $(core_files) glibc_random.vhdl glibc_random_helpers.vhdl $@.vhdl -e $@

soc_reset_tb: fpga/soc_reset_tb.vhdl fpga/soc_reset.vhdl
$(GHDL) -c $(GHDLFLAGS) fpga/soc_reset_tb.vhdl fpga/soc_reset.vhdl -e $@

@ -115,10 +125,8 @@ $(soc_dram_tbs):
$(error "Verilator is required to make this target !")
else

VERILATOR_CFLAGS=-O3
VERILATOR_FLAGS=-O3
verilated_dram: litedram/generated/sim/litedram_core.v
verilator $(VERILATOR_FLAGS) -CFLAGS $(VERILATOR_CFLAGS) -Wno-fatal --cc $< --trace
verilator $(VERILATOR_FLAGS) -CFLAGS $(VERILATOR_CFLAGS) -Wno-fatal --cc $<
make -C obj_dir -f ../litedram/extras/sim_dram_verilate.mk VERILATOR_ROOT=$(VERILATOR_ROOT)

SIM_DRAM_CFLAGS = -I. -Iobj_dir -Ilitedram/generated/sim -I$(VERILATOR_ROOT)/include -I$(VERILATOR_ROOT)/include/vltstd
@ -126,7 +134,7 @@ SIM_DRAM_CFLAGS += -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVL_PRINTF=printf -fa
sim_litedram_c.o: litedram/extras/sim_litedram_c.cpp verilated_dram
$(CC) $(CPPFLAGS) $(SIM_DRAM_CFLAGS) $(CFLAGS) -c $< -o $@

soc_dram_files = $(soc_files) litedram/extras/litedram-wrapper-l2.vhdl litedram/generated/sim/litedram-initmem.vhdl
soc_dram_files = $(core_files) $(soc_files) litedram/extras/litedram-wrapper-l2.vhdl litedram/generated/sim/litedram-initmem.vhdl
soc_dram_sim_files = $(soc_sim_files) litedram/extras/sim_litedram.vhdl
soc_dram_sim_obj_files = $(soc_sim_obj_files) sim_litedram_c.o
dram_link_files=-Wl,obj_dir/Vlitedram_core__ALL.a -Wl,obj_dir/verilated.o -Wl,obj_dir/verilated_vcd_c.o -Wl,-lstdc++
@ -137,25 +145,54 @@ $(soc_dram_tbs): %: $(soc_dram_files) $(soc_dram_sim_files) $(soc_dram_sim_obj_f
endif

# Hello world
MEMORY_SIZE=8192
RAM_INIT_FILE=hello_world/hello_world.hex
MEMORY_SIZE ?=8192
RAM_INIT_FILE ?=hello_world/hello_world.hex

# Micropython
#MEMORY_SIZE=393216
#RAM_INIT_FILE=micropython/firmware.hex

FPGA_TARGET ?= ORANGE-CRAB
FPGA_TARGET ?= ORANGE-CRAB-0.21

# FIXME: icache RAMs aren't being inferrenced as block RAMs on ECP5
# with yosys, so make it smaller for now as a workaround.
ICACHE_NUM_LINES=4

clkgen=fpga/clk_gen_ecp5.vhd
toplevel=fpga/top-generic.vhdl
dmi_dtm=dmi_dtm_dummy.vhdl
LITEDRAM_GHDL_ARG=

# OrangeCrab with ECP85
# OrangeCrab with ECP85 (original v0.0 with UM5G-85 chip)
ifeq ($(FPGA_TARGET), ORANGE-CRAB)
RESET_LOW=true
CLK_INPUT=50000000
CLK_FREQUENCY=40000000
CLK_INPUT=48000000
CLK_FREQUENCY=48000000
LPF=constraints/orange-crab.lpf
PACKAGE=CSFBGA285
NEXTPNR_FLAGS=--um5g-85k --freq 40
NEXTPNR_FLAGS=--um5g-85k --freq 48
OPENOCD_JTAG_CONFIG=openocd/olimex-arm-usb-tiny-h.cfg
OPENOCD_DEVICE_CONFIG=openocd/LFE5UM5G-85F.cfg
ECP_FLASH_OFFSET=0x80000
endif

# OrangeCrab with ECP85 (v0.21)
ifeq ($(FPGA_TARGET), ORANGE-CRAB-0.21)
RESET_LOW=true
CLK_INPUT=48000000
CLK_FREQUENCY=48000000
LPF=constraints/orange-crab-0.2.lpf
PACKAGE=CSFBGA285
NEXTPNR_FLAGS=--85k --speed 8 --freq 48 --timing-allow-fail --ignore-loops
OPENOCD_JTAG_CONFIG=openocd/olimex-arm-usb-tiny-h.cfg
OPENOCD_DEVICE_CONFIG=openocd/LFE5U-85F.cfg
DFU_VENDOR=1209
DFU_PRODUCT=5af0
ECP_FLASH_OFFSET=0x80000
toplevel=fpga/top-orangecrab0.2.vhdl
litedram_target=orangecrab-85-0.2
soc_extra_v += litesdcard/generated/lattice/litesdcard_core.v
dmi_dtm=dmi_dtm_ecp5.vhdl
endif

# ECP5-EVN
@ -170,12 +207,17 @@ OPENOCD_JTAG_CONFIG=openocd/ecp5-evn.cfg
OPENOCD_DEVICE_CONFIG=openocd/LFE5UM5G-85F.cfg
endif

ifneq ($(litedram_target),)
soc_extra_synth += litedram/extras/litedram-wrapper-l2.vhdl \
litedram/generated/$(litedram_target)/litedram-initmem.vhdl
soc_extra_v += litedram/generated/$(litedram_target)/litedram_core.v
LITEDRAM_GHDL_ARG=-gUSE_LITEDRAM=true
endif

GHDL_IMAGE_GENERICS=-gMEMORY_SIZE=$(MEMORY_SIZE) -gRAM_INIT_FILE=$(RAM_INIT_FILE) \
-gRESET_LOW=$(RESET_LOW) -gCLK_INPUT=$(CLK_INPUT) -gCLK_FREQUENCY=$(CLK_FREQUENCY)
-gRESET_LOW=$(RESET_LOW) -gCLK_INPUT=$(CLK_INPUT) -gCLK_FREQUENCY=$(CLK_FREQUENCY) -gICACHE_NUM_LINES=$(ICACHE_NUM_LINES) \
$(LITEDRAM_GHDL_ARG)

clkgen=fpga/clk_gen_ecp5.vhd
toplevel=fpga/top-generic.vhdl
dmi_dtm=dmi_dtm_dummy.vhdl

ifeq ($(FPGA_TARGET), verilator)
RESET_LOW=true
@ -184,22 +226,20 @@ CLK_FREQUENCY=50000000
clkgen=fpga/clk_gen_bypass.vhd
endif

fpga_files = $(core_files) $(soc_files) fpga/soc_reset.vhdl \
fpga_files = fpga/soc_reset.vhdl \
fpga/pp_fifo.vhd fpga/pp_soc_uart.vhd fpga/main_bram.vhdl \
nonrandom.vhdl

synth_files = $(core_files) $(soc_files) $(fpga_files) $(clkgen) $(toplevel) $(dmi_dtm)
synth_files = $(core_files) $(soc_files) $(soc_extra_synth) $(fpga_files) $(clkgen) $(toplevel) $(dmi_dtm)

microwatt.json: $(synth_files) $(RAM_INIT_FILE)
$(YOSYS) -m $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(GHDL_TARGET_GENERICS) $(synth_files) -e toplevel; synth_ecp5 -json $@ $(SYNTH_ECP5_FLAGS)" $(uart_files)
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; read_verilog $(uart_files) $(soc_extra_v); synth_ecp5 -abc9 -nowidelut -json $@ $(SYNTH_ECP5_FLAGS)"

microwatt.v: $(synth_files) $(RAM_INIT_FILE)
$(YOSYS) -m $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(GHDL_TARGET_GENERICS) $(synth_files) -e toplevel; write_verilog $@"
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; write_verilog $@"

# Need to investigate why yosys is hitting verilator warnings, and eventually turn on -Wall
microwatt-verilator: microwatt.v verilator/microwatt-verilator.cpp verilator/uart-verilator.c
verilator -O3 -CFLAGS "-DCLK_FREQUENCY=$(CLK_FREQUENCY)" --assert --cc microwatt.v --exe verilator/microwatt-verilator.cpp verilator/uart-verilator.c -o $@ -Iuart16550 -Wno-fatal -Wno-CASEOVERLAP -Wno-UNOPTFLAT #--trace
make -C obj_dir -f Vmicrowatt.mk
$(VERILATOR) $(VERILATOR_FLAGS) -CFLAGS "$(VERILATOR_CFLAGS) -DCLK_FREQUENCY=$(CLK_FREQUENCY)" -Iuart16550 --assert --cc --exe --build $^ -o $@ -top-module toplevel
@cp -f obj_dir/microwatt-verilator microwatt-verilator

microwatt_out.config: microwatt.json $(LPF)
@ -207,18 +247,36 @@ microwatt_out.config: microwatt.json $(LPF)
mv -f $@.tmp $@

microwatt.bit: microwatt_out.config
$(ECPPACK) --svf microwatt.svf $< $@
$(ECPPACK) --compress --freq 38.8 --svf microwatt.svf $< $@

microwatt.svf: microwatt.bit

prog: microwatt.svf
$(OPENOCD) -f $(OPENOCD_JTAG_CONFIG) -f $(OPENOCD_DEVICE_CONFIG) -c "transport select jtag; init; svf $<; exit"

microwatt.dfu: microwatt.bit
cp $< $@.tmp
$(DFUSUFFIX) -v $(DFU_VENDOR) -p $(DFU_PRODUCT) -a $@.tmp
mv $@.tmp $@

dfuprog: microwatt.dfu
$(DFUUTIL) -a 0 -D $<

ecpprog: microwatt.bit
$(ECPPROG) -S $<

ecpflash: microwatt.bit
test -n "$(ECP_FLASH_OFFSET)" || (echo Error: No ECP_FLASH_OFFSET defined for target; exit 1)
$(ECPPROG) -o $(ECP_FLASH_OFFSET) $<

tests = $(sort $(patsubst tests/%.out,%,$(wildcard tests/*.out)))
tests_console = $(sort $(patsubst tests/%.console_out,%,$(wildcard tests/*.console_out)))

tests_console: $(tests_console)

check_vunit:
$(VUNITRUN) $(VUNITARGS)

check: $(tests) tests_console test_micropython test_micropython_long tests_unit

check_light: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 test_micropython test_micropython_long tests_console tests_unit
@ -232,22 +290,24 @@ $(tests_console): core_tb
test_micropython: core_tb
@./scripts/test_micropython.py

test_micropython_verilator: microwatt-verilator
@./scripts/test_micropython_verilator.py

test_micropython_long: core_tb
@./scripts/test_micropython_long.py

tests_core_tb = $(patsubst %_tb,%_tb_test,$(core_tbs))
test_micropython_verilator_long: microwatt-verilator
@./scripts/test_micropython_verilator_long.py

tests_soc_tb = $(patsubst %_tb,%_tb_test,$(soc_tbs))

%_test: %
./$< --assert-level=error > /dev/null

tests_core: $(tests_core_tb)

tests_soc: $(tests_soc_tb)

# FIXME SOC tests have bit rotted, so disable for now
#tests_unit: tests_core tests_soc
tests_unit: tests_core
#tests_unit: tests_soc

TAGS:
find . -name '*.vhdl' | xargs ./scripts/vhdltags

@ -97,15 +97,19 @@ sudo dnf copr enable sharkcz/danny
sudo dnf install fusesoc
```

- Create a working directory and point FuseSoC at microwatt:
- If this is your first time using fusesoc, initialize fusesoc.
This is needed to be able to pull down fussoc library components referenced
by microwatt. Run

```
mkdir microwatt-fusesoc
cd microwatt-fusesoc
fusesoc library add microwatt /path/to/microwatt/
fusesoc init
fusesoc fetch uart16550
fusesoc library add microwatt /path/to/microwatt
```

- Build using FuseSoC. For hello world (Replace nexys_video with your FPGA board such as --target=arty_a7-100):
You may wish to ensure you have [installed Digilent Board files](https://reference.digilentinc.com/vivado/installing-vivado/start#installing_digilent_board_files)
or appropriate files for your board first.

```
fusesoc run --target=nexys_video microwatt --memory_size=16384 --ram_init_file=/path/to/microwatt/fpga/hello_world.hex
@ -118,6 +122,68 @@ You should then be able to see output via the serial port of the board (/dev/tty
fusesoc run --target=nexys_video microwatt
```

## Linux on Microwatt

Mainline Linux supports Microwatt as of v5.14. The Arty A7 is the best tested
platform, but it's also been tested on the OrangeCrab and ButterStick.

1. Use buildroot to create a userspace

A small change is required to glibc in order to support the VMX/AltiVec-less
Microwatt, as float128 support is mandiatory and for this in GCC requires
VSX/AltiVec. This change is included in Joel's buildroot fork, along with a
defconfig:
```
git clone -b microwatt https://github.com/shenki/buildroot
cd buildroot
make ppc64le_microwatt_defconfig
make
```

The output is `output/images/rootfs.cpio`.

2. Build the Linux kernel
```
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
make ARCH=powerpc microwatt_defconfig
make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- \
CONFIG_INITRAMFS_SOURCE=/buildroot/output/images/rootfs.cpio -j`nproc`
```

The output is `arch/powerpc/boot/dtbImage.microwatt.elf`.

3. Build gateware using FuseSoC

First configure FuseSoC as above.
```
fusesoc run --build --target=arty_a7-100 microwatt --no_bram --memory_size=0
```

The output is `build/microwatt_0/arty_a7-100-vivado/microwatt_0.bit`.

4. Program the flash

This operation will overwrite the contents of your flash.

For the Arty A7 A100, set `FLASH_ADDRESS` to `0x400000` and pass `-f a100`.

For the Arty A7 A35, set `FLASH_ADDRESS` to `0x300000` and pass `-f a35`.
```
microwatt/openocd/flash-arty -f a100 build/microwatt_0/arty_a7-100-vivado/microwatt_0.bit
microwatt/openocd/flash-arty -f a100 dtbImage.microwatt.elf -t bin -a $FLASH_ADDRESS
```

5. Connect to the second USB TTY device exposed by the FPGA

```
minicom -D /dev/ttyUSB1
```

The gateware has firmware that will look at `FLASH_ADDRESS` and attempt to
parse an ELF there, loading it to the address specified in the ELF header
and jumping to it.

## Testing

- A simple test suite containing random execution test cases and a couple of
@ -129,8 +195,5 @@ make -j$(nproc) check

## Issues

This is functional, but very simple. We still have quite a lot to do:

- There are a few instructions still to be implemented
- Need to add caches and bypassing (in progress)
- Need to add supervisor state (in progress)
- There are a few instructions still to be implemented:
- Vector/VMX/VSX

@ -5,21 +5,21 @@ use ieee.math_real.all;

entity cache_ram is
generic(
ROW_BITS : integer := 16;
WIDTH : integer := 64;
TRACE : boolean := false;
ADD_BUF : boolean := false
);
ROW_BITS : integer := 16;
WIDTH : integer := 64;
TRACE : boolean := false;
ADD_BUF : boolean := false
);

port(
clk : in std_logic;
rd_en : in std_logic;
rd_addr : in std_logic_vector(ROW_BITS - 1 downto 0);
rd_data : out std_logic_vector(WIDTH - 1 downto 0);
wr_sel : in std_logic_vector(WIDTH/8 - 1 downto 0);
wr_addr : in std_logic_vector(ROW_BITS - 1 downto 0);
wr_data : in std_logic_vector(WIDTH - 1 downto 0)
);
clk : in std_logic;
rd_en : in std_logic;
rd_addr : in std_logic_vector(ROW_BITS - 1 downto 0);
rd_data : out std_logic_vector(WIDTH - 1 downto 0);
wr_sel : in std_logic_vector(WIDTH/8 - 1 downto 0);
wr_addr : in std_logic_vector(ROW_BITS - 1 downto 0);
wr_data : in std_logic_vector(WIDTH - 1 downto 0)
);

end cache_ram;

@ -35,13 +35,13 @@ architecture rtl of cache_ram is

begin
process(clk)
variable lbit : integer range 0 to WIDTH - 1;
variable mbit : integer range 0 to WIDTH - 1;
variable widx : integer range 0 to SIZE - 1;
constant sel0 : std_logic_vector(WIDTH/8 - 1 downto 0)
variable lbit : integer range 0 to WIDTH - 1;
variable mbit : integer range 0 to WIDTH - 1;
variable widx : integer range 0 to SIZE - 1;
constant sel0 : std_logic_vector(WIDTH/8 - 1 downto 0)
:= (others => '0');
begin
if rising_edge(clk) then
if rising_edge(clk) then
if TRACE then
if wr_sel /= sel0 then
report "write a:" & to_hstring(wr_addr) &
@ -57,29 +57,29 @@ begin
ram(widx)(mbit downto lbit) <= wr_data(mbit downto lbit);
end if;
end loop;
if rd_en = '1' then
rd_data0 <= ram(to_integer(unsigned(rd_addr)));
if TRACE then
report "read a:" & to_hstring(rd_addr) &
" dat:" & to_hstring(ram(to_integer(unsigned(rd_addr))));
end if;
end if;
end if;
if rd_en = '1' then
rd_data0 <= ram(to_integer(unsigned(rd_addr)));
if TRACE then
report "read a:" & to_hstring(rd_addr) &
" dat:" & to_hstring(ram(to_integer(unsigned(rd_addr))));
end if;
end if;
end if;
end process;

buf: if ADD_BUF generate
begin
process(clk)
begin
if rising_edge(clk) then
rd_data <= rd_data0;
end if;
end process;
process(clk)
begin
if rising_edge(clk) then
rd_data <= rd_data0;
end if;
end process;
end generate;

nobuf: if not ADD_BUF generate
begin
rd_data <= rd_data0;
rd_data <= rd_data0;
end generate;

end;

@ -21,6 +21,7 @@ package common is
constant MSR_FE1 : integer := (63 - 55); -- Floating Exception mode
constant MSR_IR : integer := (63 - 58); -- Instruction Relocation
constant MSR_DR : integer := (63 - 59); -- Data Relocation
constant MSR_PMM : integer := (63 - 61); -- Performance Monitor Mark
constant MSR_RI : integer := (63 - 62); -- Recoverable Interrupt
constant MSR_LE : integer := (63 - 63); -- Little Endian

@ -51,9 +52,37 @@ package common is
constant SPR_HSPRG0 : spr_num_t := 304;
constant SPR_HSPRG1 : spr_num_t := 305;
constant SPR_PID : spr_num_t := 48;
constant SPR_PRTBL : spr_num_t := 720;
constant SPR_PTCR : spr_num_t := 464;
constant SPR_PVR : spr_num_t := 287;

-- PMU registers
constant SPR_UPMC1 : spr_num_t := 771;
constant SPR_UPMC2 : spr_num_t := 772;
constant SPR_UPMC3 : spr_num_t := 773;
constant SPR_UPMC4 : spr_num_t := 774;
constant SPR_UPMC5 : spr_num_t := 775;
constant SPR_UPMC6 : spr_num_t := 776;
constant SPR_UMMCR0 : spr_num_t := 779;
constant SPR_UMMCR1 : spr_num_t := 782;
constant SPR_UMMCR2 : spr_num_t := 769;
constant SPR_UMMCRA : spr_num_t := 770;
constant SPR_USIER : spr_num_t := 768;
constant SPR_USIAR : spr_num_t := 780;
constant SPR_USDAR : spr_num_t := 781;
constant SPR_PMC1 : spr_num_t := 787;
constant SPR_PMC2 : spr_num_t := 788;
constant SPR_PMC3 : spr_num_t := 789;
constant SPR_PMC4 : spr_num_t := 790;
constant SPR_PMC5 : spr_num_t := 791;
constant SPR_PMC6 : spr_num_t := 792;
constant SPR_MMCR0 : spr_num_t := 795;
constant SPR_MMCR1 : spr_num_t := 798;
constant SPR_MMCR2 : spr_num_t := 785;
constant SPR_MMCRA : spr_num_t := 786;
constant SPR_SIER : spr_num_t := 784;
constant SPR_SIAR : spr_num_t := 796;
constant SPR_SDAR : spr_num_t := 797;

-- GPR indices in the register file (GPR only)
subtype gpr_index_t is std_ulogic_vector(4 downto 0);

@ -127,6 +156,12 @@ package common is
constant FPSCR_NI : integer := 63 - 61;
constant FPSCR_RN : integer := 63 - 63;

-- Real addresses
-- REAL_ADDR_BITS is the number of real address bits that we store
constant REAL_ADDR_BITS : positive := 56;
subtype real_addr_t is std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
function addr_to_real(addr: std_ulogic_vector(63 downto 0)) return real_addr_t;

-- Used for tracking instruction completion and pending register writes
constant TAG_COUNT : positive := 4;
constant TAG_NUMBER_BITS : natural := log2(TAG_COUNT);
@ -165,8 +200,8 @@ package common is
priv_mode : std_ulogic;
big_endian : std_ulogic;
stop_mark: std_ulogic;
sequential: std_ulogic;
predicted : std_ulogic;
pred_ntaken : std_ulogic;
nia: std_ulogic_vector(63 downto 0);
end record;

@ -178,6 +213,12 @@ package common is
insn: std_ulogic_vector(31 downto 0);
big_endian: std_ulogic;
next_predicted: std_ulogic;
next_pred_ntaken: std_ulogic;
end record;

type IcacheEventType is record
icache_miss : std_ulogic;
itlb_miss_resolved : std_ulogic;
end record;

type Decode1ToDecode2Type is record
@ -303,6 +344,51 @@ package common is
is_extended => '0', is_modulus => '0',
neg_result => '0', others => (others => '0'));

type PMUEventType is record
no_instr_avail : std_ulogic;
dispatch : std_ulogic;
ext_interrupt : std_ulogic;
instr_complete : std_ulogic;
fp_complete : std_ulogic;
ld_complete : std_ulogic;
st_complete : std_ulogic;
br_taken_complete : std_ulogic;
br_mispredict : std_ulogic;
ipref_discard : std_ulogic;
itlb_miss : std_ulogic;
itlb_miss_resolved : std_ulogic;
icache_miss : std_ulogic;
dc_miss_resolved : std_ulogic;
dc_load_miss : std_ulogic;
dc_ld_miss_resolved : std_ulogic;
dc_store_miss : std_ulogic;
dtlb_miss : std_ulogic;
dtlb_miss_resolved : std_ulogic;
ld_miss_nocache : std_ulogic;
ld_fill_nocache : std_ulogic;
end record;
constant PMUEventInit : PMUEventType := (others => '0');

type Execute1ToPMUType is record
mfspr : std_ulogic;
mtspr : std_ulogic;
spr_num : std_ulogic_vector(4 downto 0);
spr_val : std_ulogic_vector(63 downto 0);
tbbits : std_ulogic_vector(3 downto 0); -- event bits from timebase
pmm_msr : std_ulogic; -- PMM bit from MSR
pr_msr : std