microwatt

Commit Graph

Author	SHA1	Message	Date
Michael Neuling	d7458d5beb	Reduce the size of icache to help yosys ECP5 builds (#303 ) The icache RAM is currently LUT ram not block ram. This massively bloats the icache size. We think this is due to yosys not inferencing the RAM correctly but that's yet to be confirmed. Work around this for now by reducing the default size of the icache RAM for the ECP5 builds. On the ECP5 85K builts, this gets us from 95% down to 76% and helps our CI to pass. Signed-off-by: Michael Neuling <mikey@neuling.org>	5 years ago
Michael Neuling	84473eda1b	Merge pull request #277 from paulus/gpio A few cleanups. GPIO IRQ number is now 4 as 3 is now taken by the SD card.	5 years ago
Paul Mackerras	21ed730514	arty_a7: Add litesdcard interface This adds litesdcard.v generated from the litex/litesdcard project, along with logic in top-arty.vhdl to connect it into the system. There is now a DMA wishbone coming in to soc.vhdl which is narrower than the other wishbone masters (it has 32-bit data rather than 64-bit) so there is a widening/narrowing adapter between it and the main wishbone master arbiter. Also, litesdcard generates a non-pipelined wishbone for its DMA connection, which needs to be converted to a pipelined wishbone. We have a latch on both the incoming and outgoing sides of the wishbone in order to help make timing (at the cost of two extra cycles of latency). litesdcard generates an interrupt signal which is wired up to input 3 of the ICS (IRQ 19). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	f06a0f4e5a	arty: Update GPIOs for Boxarty BMC Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	f06ffcf9b7	Add a GPIO controller and use it to drive the shield I/O pins on the Arty This adds a GPIO controller which provides 32 bits of I/O. The registers are modelled on the set used by the gpio-ftgpio010.c driver in the Linux kernel. Currently there is no interrupt capability implemented, though an interrupt line from the GPIO subsystem to the XICS has been connected. For the Arty A7 board, GPIO lines 0 to 13 are connected to the pins labelled IO0 to IO13 on the "shield" connector, GPIO lines 14 to 29 connect to IO26 to IO41, GPIO line 30 connects to the pin labelled A (aka IO42), and GPIO line 31 is connected to LED 7. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	0fb207be60	fetch1: Implement a simple branch target cache This implements a cache in fetch1, where each entry stores the address of a simple branch instruction (b or bc) and the target of the branch. When fetching sequentially, if the address being fetched matches the cache entry, then fetching will be redirected to the branch target. The cache has 1024 entries and is direct-mapped, i.e. indexed by bits 11..2 of the NIA. The bus from execute1 now carries information about taken and not-taken simple branches, which fetch1 uses to update the cache. The cache entry is updated for both taken and not-taken branches, with the valid bit being set if the branch was taken and cleared if the branch was not taken. If fetching is redirected to the branch target then that goes down the pipe as a predicted-taken branch, and decode1 does not do any static branch prediction. If fetching is not redirected, then the next instruction goes down the pipe as normal and decode1 does its static branch prediction. In order to make timing, the lookup of the cache is pipelined, so on each cycle the cache entry for the current NIA + 8 is read. This means that after a redirect (from decode1 or execute1), only the third and subsequent sequentially-fetched instructions will be able to be predicted. This improves the coremark value on the Arty A7-100 from about 180 to about 190 (more than 5%). The BTC is optional. Builds for the Artix 7 35-T part have it off by default because the extra ~1420 LUTs it takes mean that the design doesn't fit on the Arty A7-35 board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	2be2440734	Arty A7: Document pin connections for on-board headers This adds, as comments, lines which would if uncommented define properties which associate the pins of the headers on the Arty A7 board with FPGA pins. It also adds properties for LEDs 1--3, also commented out for now. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Anton Blanchard	80cf489e96	Add LOG_LENGTH to top-generic.vhdl The other top level files allow LOG_LENGTH to be configured. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Paul Mackerras	45cd8f4fc3	core: Add support for floating-point loads and stores This extends the register file so it can hold FPR values, and implements the FP loads and stores that do not require conversion between single and double precision. We now have the FP, FE0 and FE1 bits in MSR. FP loads and stores cause a FP unavailable interrupt if MSR[FP] = 0. The FPU facilities are optional and their presence is controlled by the HAS_FPU generic passed down from the top-level board file. It defaults to true for all except the A7-35 boards. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Michael Neuling	6d6cf59bb7	Merge pull request #235 from paulusmack/master More instructions and a random number generator	6 years ago
Boris Shingarov	679c547e5f	fpga: Add support for Genesys2 Signed-off-by: Boris Shingarov <shingarov@labware.com>	6 years ago
Benjamin Herrenschmidt	dbb137437c	acorn: Add support for the Acorn CLE 215+ This is a NiteFury based PCIe M2 form-factor board originally used for mining. It contains a speed grade 2 Artix 7 200T, 1GB of DDR3 and 32MB of flash. The serial port is routed to pin 2 (RX) and 3 (TX) of the P2 connector (pin 1 is GND). Note: Only 16MB of flash is currently usable until code is added to configure the flash controller to use 4-bytes address commands on that part. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Paul Mackerras	1a7aebeef8	Add random number generator and implement the darn instruction This adds a true random number generator for the Xilinx FPGAs which uses a set of chaotic ring oscillators to generate random bits and then passes them through a Linear Hybrid Cellular Automaton (LHCA) to remove bias, as described in "High Speed True Random Number Generators in Xilinx FPGAs" by Catalin Baetoniu of Xilinx Inc., in: https://pdfs.semanticscholar.org/83ac/9e9c1bb3dad5180654984604c8d5d8137412.pdf This requires adding a .xdc file to tell vivado that the combinatorial loops that form the ring oscillators are intentional. The same code should work on other FPGAs as well if their tools can be told to accept the combinatorial loops. For simulation, the random.vhdl module gets compiled in, which uses the pseudorand() function to generate random numbers. Synthesis using yosys uses nonrandom.vhdl, which always signals an error, causing darn to return 0xffff_ffff_ffff_ffff. This adds an implementation of the darn instruction. Darn can return either raw or conditioned random numbers. On Xilinx FPGAs, reading a raw random number gives the output of the ring oscillators, and reading a conditioned random number gives the output of the LHCA. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	02abb135a8	litedram: l2: Add support for more geometries Make the DRAM data lines and user port width configurable, also don't hard wire dependency on the wishbone data width. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	b0241d9f2d	corefile/nexys_video: Parameter fixes This fixes up a few issues with parameters: Only arty has "has_uart1" since we haven't added plumbing for a second UART anywhere else. Also "uart_is_16550" was mixing on one of the nexys_video targets, and nexys_video toplevel was missing LOG_LENGTH. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	a5fa92f71b	fpga: nexys-video: Wire up core_alt_reset It looks like we left it dangling Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	5449d842dd	nexys_video: Fix nexys-video build Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Michael Neuling	5aaa63ee3b	Add PLL for ECP5 device Means we can synthesize at 40Mhz (where we currently make timing) and our UART still works at 115200 baud. Tested working hello world unmodified with ECP5 eval board. Orange Crab is updated but is untested. Signed-off-by: Michael Neuling <mikey@neuling.org>	6 years ago
Benjamin Herrenschmidt	fb5c16d05e	uart: Make 16550 the default Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	7575b1e0c2	uart: Import and hook up opencore 16550 compatible UART This imports via fusesoc a 16550 compatible (ie "standard") UART, and wires it up optionally in the SoC instead of the potato one. This also adds support for a second UART (which is always a 16550) to Arty, wired to JC "bottom" port. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	8366710217	liteeth: Hook up LiteX LiteEth ethernet controller Currently only generated for Arty. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Joel Stanley	60e5f7b958	spi: Fix dat_i_l constraints No cells matched 'get_cells -hierarchical -filter {NAME =~/spi_rxtx/dat_i_l}'. [build/microwatt_0/src/microwatt_0/fpga/arty_a7.xdc:42] The signal is in it's own process so the net name ends up being spi_rxtx/input_delay_1.dat_i_l_reg. After this change the log shows: Applied set_property IOB = TRUE for soc0/\spiflash_gen.spiflash /spi_rxtx/\input_delay_1.dat_i_l_reg . (constraint file fpga/arty_a7.xdc, line 42). Applied set_property IOB = TRUE for soc0/\spiflash_gen.spiflash /spi_rxtx/\input_delay_1.dat_i_l_reg . (constraint file fpga/arty_a7.xdc, line 42). Applied set_property IOB = TRUE for soc0/\spiflash_gen.spiflash /spi_rxtx/\input_delay_1.dat_i_l_reg . (constraint file fpga/arty_a7.xdc, line 42). Applied set_property IOB = TRUE for soc0/\spiflash_gen.spiflash /spi_rxtx/\input_delay_1.dat_i_l_reg . (constraint file fpga/arty_a7.xdc, line 42). Signed-off-by: Joel Stanley <joel@jms.id.au>	6 years ago
Michael Neuling	b90a0a2139	Merge pull request #208 from paulusmack/faster Make the core go faster Several major improvements in here: - Simple branch predictor - Reduced latency for mispredicted branches and interrupts by removing fetch2 stage - Cache improvements o Request critical dword first on refill o Handle hits while refilling, including on line being refilled o Sizes doubled for both D and I - Loadstore improvements: can now do one load or store every two cycles in most cases - Optimized 2-cycle multiplier for Xilinx 7-series parts using DSP slices - Timing improvements, including: o Stash buffer in decode1 o Reduced width of execute1 result mux o Improved SPR decode in decode1 o Some non-critical operation take a cycle longer so we can break some long combinatorial chains - Core logging: logs 256 bits of info every cycle into a ring buffer, to help with debugging and performance analysis This increases the LUT usage for the "synth" + A35 target from 9182 to 10297 = 12%.	6 years ago
Paul Mackerras	78de4fef72	Make LOG_LENGTH configurable per FPGA variant This plumbs the LOG_LENGTH parameter (which controls how many entries the core log RAM has) up to the top level so that it can be set on the fusesoc command line and have different default values on different FPGAs. It now defaults to 512 entries generally and on the Artix-7 35 parts, and 2048 on the larger Artix-7 FPGAs. It can be set to 0 if desired. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	f9f18906a3	soc: Rename wb_dram_ctrl to wb_ext_io and rework decoding This makes the control bus currently going out of "soc" towards litedram more generic for external IO devices added by the top-level rather than inside the SoC proper. This is mostly renaming of signals and a small change on how the address decoder operates, using a separate "cascaded" decode for the external IOs. We make the region 0xc8nn_nnnn be the "external IO" region for now. This will make it easier / cleaner to add more external devices. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	bf7def5503	soc: Don't require dram wishbones signals to be wired by toplevel Currently, when not using litedram, the top level still has to hook up "dummy" wishbones to the main dram and control dram busses coming out of the SoC and provide ack signals. Instead, make the SoC generate the acks internally when not using litedram and use defaults to make the wiring entirely optional. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	1ffc89e58b	soc: Add defaults for some input signals That way the top-level's don't need to assign them Also remove generics that are set to the default anyways Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	4244b54984	soc: Remove unused RESET_LOW generic Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	e5aa0e9dc9	uart: Remove combinational loops on ack and stall signal They hurt timing forcing signals to come from the master and back again in one cycle. Stall isn't sampled by the master unless there is an active cycle so masking it with cyc is pointless. Masking acks is somewhat pointless too as we don't handle early dropping of cyc in any of our slaves properly anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	cc4dcb3597	spi: Add SPI Flash controller This adds an SPI flash controller which supports direct memory-mapped access to the flash along with a manual mode to send commands. The direct mode can be set via generic to default to single wire or quad mode. The controller supports normal, dual and quad accesses with configurable commands, clock divider, dummy clocks etc... The SPI clock can be an even divider of sys_clk starting at 2 (so max 50Mhz with our typical Arty designs). A flash offset is carried via generics to syscon to tell SW about which portion of the flash is reserved for the FPGA bitfile. There is currently no plumbing to make the CPU reset past that address (TBD). Note: Operating at 50Mhz has proven unreliable without adding some delay to the sampling of the input data. I'm working in improving this, in the meantime, I'm leaving the default set at 25 Mhz. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	5ae5f76558	arty/nexys-video: Update XDC The DRAM related pins have some small changes in LiteX, so resync and add the false path information as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	a93d9e77c9	litedram: Remove remnants of riscv-inits We still had some wires bringing an extra serial port out of litedram for the built-in riscv processor. This is all gone now so take them out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	bf1b98b958	litedram: Add support for booting without BRAM This adds an option to disable the main BRAM and instead copy a payload stashed along with the init code in the secondary BRAM into DRAM and boot from there Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	573b6b4bc4	soc: Rework interconnect This changes the SoC interconnect such that the main 64-bit wishbone out of the processor is first split between only 3 slaves (BRAM, DRAM and a general "IO" bus) instead of all the slaves in the SoC. The IO bus leg is then latched and down-converted to 32 bits data width, before going through a second address decoder for the various IO devices. This significantly reduces routing and timing pressure on the main bus, allowing to get rid of frequent timing violations when synthetizing on small'ish FPGAs such as the Artix-7 35T found on the original Arty board. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Anton Blanchard	ab86b58d95	Exit cleanly from testbench on success Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	4e78b8078e	Merge branch 'master' into litedram	6 years ago
Anton Blanchard	f96d179f66	Some yosys fixes This gets the yosys build further along, but I'm now chasing what looks like a yosys bug. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Benjamin Herrenschmidt	7560e8f2ff	arty/nexys: Rework reset with litedram When using litedram, request a much longer PLL reset. This seems to help get rid of all the grabled output after config. Also use the clean system_rst out of litedram as our source of reset for the rest of the SoC (it is synchronized with system_clk and takes pll_locked into account already)	6 years ago
Benjamin Herrenschmidt	3b603402d2	soc_reset: Use counters, add synchronizers In some cases we need to keep the reset held for much longer, so use counters rather than shift registers. Additionally, some signals such as ext_rst and pll_locked or signals going from the ext_clk domain to the pll_clk domain need to be treated as async, and testing them without synchronizers is asking for trouble. Finally, make the external reset also reset the PLL. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	c19b5b8cc7	litedram: Update to new LiteX/LiteDRAM version Things have changed a bit in upstream LiteX. LiteDRAM now exposes a wishbone for the CSRs for example. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	13e84b0bbb	pp_soc_uart: Fix rx synchronizers and ensure stable tx init state The rx synchronizers were ... non existent. Someone forgot to add a if rising_edge(clk) to the process. For tx, ensure that we have a default value so that TX stays high from TPGA configuration to the reset being sampled on the first clock cycle. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	bd42580a42	pp_fifo: Fix full fifo losing all data on simultaneous push & pop The pp_fifo decides whether top = bottom means empty or full based on whether the previous operation was a push or a pop. If the fifo performs both in one cycle, it sets the previous op to pop. That means that a full fifo being added a character and removed one at the same time becomes empty. Instead, just leave the previous op alone. If the fifo was empty, it remains so, if it was full ditto. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	c5f5f50738	hello_world: Use new headers and frequency from syscon This uses the new header files for register definitions and extracts the core frequency from syscon rather than hard coding it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	025cf5efe8	syscon: Add syscon registers These provides some info about the SoC (though it's still somewhat incomplete and needs more work, see comments). There's also a control register for selecting DRAM vs. BRAM at 0 (and for soft-resetting the SoC but that isn't wired up yet). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	2cef3005cd	fpga: Hookup nexys-video to litedram Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	3ac815823c	fpga: Hookup Arty to litedram The old toplevel.vhdl becomes top-generic.vhdl, which is to be used by platforms that do not have a litedram option. Arty has its own top-arty.vhdl which supports litedram and is now hooked up Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	8bb3c8f8b6	soc: Add DRAM address decoding Still not attached to any board Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	3687486d36	Update hello_world for 100Mhz clock Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Anton Blanchard	61d5e61f09	Add a few FFs on the RX input to avoid metastability issues Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	f5424f8e71	Reduce simulated and default FPGA RAM to 384kB Micropython has been able to fit into 384kB for ages, so lets reduce our simulated RAM. This is useful for testing if micropython will run on an ECP5 85k, which has enough BRAM for 384kB but not enough for 512kB. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago

1 2

90 Commits (46cde3bb23fe464f1fbb68a20275e6dca2a9a73f)