microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	0aa898c7a6	xics: Rework the irq_gen process At present, the loop in the irq_gen process generates a chain of comparators and other logic to work out the source number and priority of the most-favoured (lowest priority number) pending interrupt. This replaces that chain with (1) logic to generate an array of bits, one per priority, indicating whether any interrupt is pending at that priority, (2) a priority encoder to select the most favoured priority with an interrupt pending, (3) logic to generate an array of bits, one per source, indicating whether an interrupt is pending at the priority calculated in step 2, and (4) a priority encoder to work out the lowest numbered source that has an interrupt pending at the selected priority. This reduces LUT utilization. The priority encoder function implemented here uses the optimized count-leading-zeroes logic from helpers.vhdl. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	1720a0584a	Use alternative count-leading-zeroes algorithm in the FPU and LSU Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	1086988883	countzero: Use alternative algorithm for higher bits This implements an alternative count-leading-zeroes algorithm which uses less LUTs to generate the higher-order bits (2..5) of the result. By doing (v \| -v) rather than (v & -v), we get a value which has ones from the MSB down to the rightmost 1 bit in v and then zeroes down to the LSB. This means that we can generate the MSB of the result (the index of the rightmost 1 bit in v) just by looking at bits 63 and 31 of (v \| -v), assuming that v is 64 bits. Bit 4 of the result requires looking at bits 63, 47, 31 and 15. In contrast, each bit of the result using (v & -v), which has a single 1, requires ORing together 32 bits. It turns out that the minimum LUT usage comes from using (v & -v) to generate bits 0 and 1 of the result, and using (v \| -v) to generate bits 2 to 5. This saves almost 60 6-input LUTs on the Artix-7. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	4cf2921b0b	soc: Re-do peripheral address decode to improve timing This generates a series of io_cycle_* signals which are clean latches and which become the 'cyc' signals of the wishbone buses going to various peripherals (syscon, uarts, XICS, GPIO, etc.). Effectively this is done by moving the address decoding into the slave_io_latch process. The slave_io_type, which drives the multiplexer which selects which wishbone to look for a response on, is reduced to just 8 values in the expectation that an 8-way multiplexer will use less logic than one with more than 8 inputs. With this timing is considerably better on the A7-100T. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	49ec80ac3e	fetch1/icache1: Remove the use_previous logic This removes logic that I added some time ago with the thought that it would enable us to do prefetching in the icache. This logic detects when the fetch address is an odd multiple of 4 and the next address in sequence from the previous cycle. In that case the instruction we want is in the output register of the icache RAM already so there is no need to do another read or any icache tag or TLB lookup. However, this logic adds complexity, and removing it improves timing, so this removes it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	cef3660e74	Merge pull request #345 from antonblanchard/popcnt-go-fast popcnt* timing improvements from Paul	4 years ago
Paul Mackerras	2491aa7fc5	core: Make popcnt* take two cycles This moves the calculation of the result for popcnt* into the countbits unit, renamed from countzero, so that we can take two cycles to get the result. The motivation for this is that the popcnt* calculation was showing up as a critical path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Michael Neuling	6ff3b2499c	Merge pull request #342 from mkj/orangecrab-merge Orangecrab working with litedram Fixed up a few simple merge conflicts in the Makefile.	4 years ago
Michael Neuling	cdd661d844	Merge branch 'master' into orangecrab-merge	4 years ago
Michael Neuling	fda8879e2f	Merge pull request #341 from mkj/progtools orangecrab programming targets	4 years ago
Michael Neuling	ffbf2f9964	Merge pull request #340 from mkj/orangecrab-ghdl-plugin Makefile: detect when ghdl is a yosys plugin	4 years ago
Matt Johnston	049f0549d8	orangecrab: Fix sdcard wishbone addressing Orangecrab missed out on: Make wishbone addresses be in units of doublewords or words Author: Paul Mackerras <paulus@ozlabs.org> Date: Wed Sep 15 18:18:09 2021 +1000 Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	abc6a4f372	orangecrab: use litesdcard Currently not working (tested in Linux) Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	42959184dd	litesdcard: add lattice, regenerate Modifies litescard generate script to take a clock speed. Regenerated verilog with latest litesdcard e52c731 ("Bump year.") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	d794cc70b1	orangecrab: No BTC, LOG_LENGTH, dram NUM_LINES Reduce litedram NUM_LINES 64->8 This allows us to meet timing. Can probably be improved in future with better BRAM usage. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	a8d9203c5d	orangecrab: Use litedram Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	57d4c4c117	orangecrab: set HAS_SHORT_MULT It seems free, generated as a single MULT18X18D Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	a9b467f43b	orangecrab: add Orange Crab r0.2 target top-orangecrab0.2 is a copy of top-arty with various changes. USRMCLK is added for the SPI clock ethernet is removed Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	8901e84d8d	litedram: Add orangecrab-85-0.2 target Parameters are based on https://github.com/gregdavill/OrangeCrab-test-sw/blob/main/hw/OrangeCrab-bitstream.py and litex-boards orangecrab.py rtt_nom and cmd_delay are overridden for OrangeCrab, we do the same here. Generated with litedram and litex 62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.") add2746a ("tools/litex_cli: Rename wb to bus.") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	08021ae28e	litedram: set Makefile -Werror Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	5a3cdc8b22	litedram: disable block_until_ready, regenerate Recent litedram gets stuck at memtest unless block_until_ready=False. (discussion in https://github.com/enjoy-digital/litedram/pull/292) This change regenerates with latest litedram and litex 62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.") add2746a ("tools/litex_cli: Rename wb to bus.") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	5e90133b61	Makefile: add ecpprog targets The 0x80000 offset is specific to the OrangeCrab bootloader. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	7761bf8b71	Makefile: Add DFU programming Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Matt Johnston	2ec0d5fccd	Makefile: detect when ghdl is a yosys plugin oss-cad-suite builds it as a plugin, some other toolchains have it built in. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>	4 years ago
Anton Blanchard	67164a6ffa	Merge pull request #338 from shenki/yosys-read-verilog Makefile: Use read_verilog with yosys	4 years ago
Joel Stanley	9ceb463957	Makefile: Use read_verilog with yosys Yosys changed command line behaviour following the v0.12 release. Work around this by using read_verilog, which maintains the old behaviour. This should work fine for current yosys and be compatible with future releases. See https://github.com/YosysHQ/yosys/issues/3109 Signed-off-by: Joel Stanley <joel@jms.id.au>	4 years ago
Michael Neuling	7fa7b45faa	Merge pull request #337 from paulusmack/fixes ECP5: Adjust PLL constants so the PLL lock indication works	5 years ago
Paul Mackerras	d458b5845c	ECP5: Adjust PLL constants so the PLL lock indication works At present, code (such as simple_random) which produces serial port output during the first few milliseconds of operation produces garbled output. The reason is that the clock has not yet stabilized and is running slow, resulting in the bit time of the serial characters being too long. The ECP5 data sheet says that the phase detector should be operated between 10 and 400 MHz. The current code operates it at 2MHz. Consequently, the PLL lock indication doesn't work, i.e. it is always zero. The current code works around that by inverting it, i.e. taking the "not locked" indication to mean "locked". Instead, we now run it at 12MHz, chosen because the common external clock inputs on ECP5 boards are 12MHz and 48MHz. Normally this would mean that the available system clock frequencies would be multiples of 12MHz, but this is a little inconvenient as we use 40MHz on the Orange Crab v0.21 boards. Instead, by using the secondary clock output for feedback, we can have any divisor of the PLL frequency as the system clock frequency. The ECP5 data sheet says the PLL oscillator can run at 400 to 800 MHz. Here we choose 480MHz since that allows us to generate 40MHz and 48MHz easily and is a multiple of 12MHz. With this, the lock signal works correctly, and the inversion can be removed. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Michael Neuling	8a030502a2	Merge pull request #336 from paulusmack/fixes Makefile: Correct parameters for the Orange Crab 85F	5 years ago
Paul Mackerras	a5c9b3c412	Makefile: Add a target for the Orange Crab v0.21 with LFE5U-85F The existing orange crab target is for an older board with a LFE5UM5G-85F device. Newer orange crab boards (v0.21) have a LFE5U-85F device in the -8 speed grade, so make a new target for them called ORANGE-CRAB-0.21. Also add flags to ecppack to indicate that the bitstream should be compressed and can be loaded at 38.8MHz. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Michael Neuling	9cbe1f4a17	Merge pull request #334 from antonblanchard/icbi-issue Add a test for icbi and dcbz issues	5 years ago
Anton Blanchard	099862bee9	Merge pull request #335 from ozbenh/misc Misc cleanups and icache fix	5 years ago
Benjamin Herrenschmidt	e675eba0df	icache: req_laddr becomes req_raddr Uses real_addr_t and only stores the real address bits Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	5cfa65e836	Introduce addr_to_wb() and wb_to_addr() helpers These convert addresses to/from wishbone addresses, and use them in parts of the caches, in order to make the code a bit more readable. Along the way, rename some functions in the caches to make it a bit clearer what they operate on and fix a bug in the icache STOP_RELOAD state where the wb address wasn't properly converted. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	d745995207	Introduce real_addr_t and addr_to_real() This moves REAL_ADDR_BITS out of the caches and defines a real_addr_t type for a real address, along with a addr_to_real() conversion helper. It makes the vhdl a bit more readable Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Anton Blanchard	2d142a6c01	tests/misc: Add a store/dcbz test We have a bug where an store near a dcbz can cause the dcbz to only zero 8 bytes. Add a test case for this. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	00259458c7	tests/misc: Add an icbi test We have a bug where an icbi can cause an instruction to execute twice. Add a test case for this. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	13439c76ba	Merge pull request #333 from ozbenh/wukong Add support for QMTech Wukong v2 board	5 years ago
Benjamin Herrenschmidt	d564672a82	Regenerate litedram and liteeth Note: There are a few patches to upstream to fix an upstream breakage of litedram standalone generator, and fix some issues with liteeth in the way it's used on Wukong. All these have pending pull requests. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	da0189af1e	Add support for QMTech Wukong v2 board For now only the V2 of the board (slightly different pinout) and only the A100T variant. I also haven't added GPIOs or anything else on the PMODs really. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	621a0f6b28	fpga/clk_gen_plle2: Add support for 50Mhz->100Mhz 50Mhz clkin, 100Mhz sys_clk, as needed for Wukon v2 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	4b1a413a2f	Add support for more spansion flash That's the one on the Wukong v2 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Anton Blanchard	c7579d74b0	Merge pull request #332 from paulusmack/fixes Bug fixes	5 years ago
Paul Mackerras	70270c066a	dcache: Fix bug with dcbz closely following stores with the same tag This fixes a bug where a dcbz can get incorrectly handled as an ordinary 8-byte store if it arrives while the dcache state machine is handling other stores with the same tag value (i.e. within the same set-sized area of memory). The logic that says whether to include a new store in the current wishbone cycle didn't take into account whether the new store was a dcbz. This adds a "req.dcbz = '0'" factor so that it does. This is necessary because dcbz is handled more like a cache line refill (but writing to memory rather than reading) than an ordinary store. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	9b3b57710a	icache: Fix icache invalidation This fixes two bugs in the flash invalidation of the icache. The first is that an instruction could get executed twice. The i-cache RAM is 2 instructions (64 bits) wide, so one read can supply results for 2 cycles. The fetch1 stage tells icache when the address is equal to the address of the previous cycle plus 4, and in cases where that is true, bit 2 of the address is 1, and the previous cycle was a cache hit, we just use the second word of the doubleword read from the cache RAM. However, the cache hit/miss logic also continues to operate, so in the case where the first word hits but the second word misses (because of an icache invalidation or a snoop occurring in the first cycle), we supply the instruction from the data previously read from the icache RAM but also stall fetch1 and start a cache reload sequence, and subsequently supply the second instruction again. This fixes the issue by inhibiting req_is_miss and stall_out when use_previous is true. The second bug is that if an icache invalidation occurs while reloading a line, we continue to reload the line, and make it valid when the reload finishes, even though some of the data may have been read before the invalidation occurred. This adds a new state STOP_RELOAD which we go to if an invalidation happens while we are in CLR_TAG or WAIT_ACK state. In STOP_RELOAD state we don't request any more reads from memory and wait for the reads we have previously requested to be acked, and then go to IDLE state. Data returned is still written to the icache RAM, but that doesn't matter because the line is invalid and is never made valid. Note that we don't have to worry about invalidations due to snooped writes while reloading a line, because the wishbone arbiter won't switch to another master once it has started sending our reload requests to memory. Thus a store to memory will either happen before any of our reads have got to memory, or after we have finished the reload (in which case we will no longer be in WAIT_ACK state). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	83dea94793	decode1: Conditional trap instructions don't need to be single-issue They can generate interrupts, but that doesn't mean they have to single-issue. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	9aaa6d3ca3	Merge pull request #330 from antonblanchard/orange-crab-freq Orange Crab is 48MHz not 50MHz, bump PLL frequency	5 years ago
Anton Blanchard	537e446562	Merge pull request #331 from ozbenh/misc jtag tooling improvements & gitignore fix	5 years ago
Benjamin Herrenschmidt	e6cb72fcd9	Add liteeth/build to gitignore Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	b557ec3a05	mw_debug: Default to jtag backend if unspecified It avoids typing it all the time Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago

1 2 3 4 5 ...

1067 Commits (0aa898c7a6975732f21481487ae44fa7cd5d3503) All Branches Search

1067 Commits (0aa898c7a6975732f21481487ae44fa7cd5d3503)

All Branches