microwatt

Commit Graph

Author	SHA1	Message	Date
Benjamin Herrenschmidt	48f260761b	writeback: Slightly improve timing The CR update currently depends on the complete data formatting mux chain. This makes it source its inputs from a bit earlier in the chian, thus improving timing a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	365f60b693	simple_ram: Turn on pipelining With a 1 cycle delay Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	c22734d0d9	wb_debug: Add wishbone pipelining support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	3df018cdc0	icache: Add wishbone pipelining support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	d363daa692	dcache: Add wishbone pipelining support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	e638c3e8ae	fpga/bram: Generate stall signal This doesn't yet pipeline the block RAM, just generate a valid stall signal so it's compatible with a pipelined master Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	37acb35773	simple_ram: Add pipelining support The generic PIPELINE_DEPTH can be set to 0 to keep it operating as a non-pipelined slave, or a larger value indicating the amount of extra cycles between requests and acks. It will always generate a valid stall signal, so it can be used in either mode with a pipelined master (but only in non-pipelined mode with a non-pipelined master). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	df1a9237f6	intercon: Generate stall signals for non-pipelined slaves So far the UART and the "miss" case. Memory will be pipelined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	7a4a9b6377	wb_arbiter: Forward stall signals They are set to '1' for non-selected devices Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	b1424e859e	icache_tb: Initialize stop_mark Too much red in gtkwave.. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	79101041d6	wishbone: Add stall signal Pipelined wishbone needs it Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	559b3bcf2d	pp_uart: reformat Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	9620a76281	Merge pull request #115 from antonblanchard/reduce-wishbone Reduce wishbone	7 years ago
Anton Blanchard	247d7d4aa0	Merge pull request #113 from mikey/exec-sim-remove Remove SIM generic from execute1	7 years ago
Anton Blanchard	1b6c246379	Merge pull request #114 from antonblanchard/dcache Dcache from Ben	7 years ago
Michael Neuling	bd4ac06243	Remove SIM generic from execute1 This does nothing, so remove. Signed-off-by: Michael Neuling <mikey@neuling.org>	7 years ago
Benjamin Herrenschmidt	6dd0b514ac	Reduce wishbone address size to 32-bit For now ... it reduces the routing pressure on the FPGA This needs manual adjustment of the address decoder in soc.vhdl, at least until I can figure out how to deal with std_match Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> # Conflicts: # soc.vhdl # Conflicts: # soc.vhdl	7 years ago
Benjamin Herrenschmidt	1a63c39704	Make it possible to change wishbone address size All that needs to be changed now is the size in wishbone_types.vhdl and the address decoder in soc.vhdl Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	cb4451498f	dcache: Add testbench A very simple one for now... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	742b21480e	insn: Simplistic implementation of icbi We don't yet have a proper snooper for the icache, so for now make icbi just flush the whole thing Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	a0d95e791e	insn: Implement isync instruction The instruction works by redirecting fetch to nia+4 (hopefully using the same adder used to generate LR) and doing a backflush. Along with being single issue, this should guarantee that the next instruction only gets fetched after the pipe's been emptied. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	6e0ee0b0db	icache & dcache: Fix store way variable We used the variable "way" in the wrong state in the cache when updating a line valid bit after the end of the wishbone transactions, we need to use the latched "store_way". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	587a5e3c45	dcache: Cleanup (mostly cosmetic) Clearly separate the 2 stages of load hits, improve naming and comments, clarify the writeback controls etc... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	265fbf894b	icache/dcache: Make both caches 32 lines, 2 ways Adding lines seems to add only little extra as the BRAMs aren't full, 2 ways is our current comprimise to limit pressure on small FPGAs. We could go to 64 lines for a little more, but timing is becoming a bit too right to my linking on the tags/LRU path of the icache, so let's leave it at 32 for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	174378b190	dcache: Introduce an extra cycle latency to make timing This makes the BRAMs use an output buffer, introducing an extra cycle latency. Without this, Vivado won't make timing at 100Mhz. We stash all the necessary response data in delayed latches, the extra cycle is NOT a state in the state machine, thus it's fully pipelined and doesn't involve stalling. This introduces an extra non-pipelined cycle for loads with update to avoid collision on the writeback output between the now delayed load data and the register update. We could avoid it by moving the register update in the pipeline bubble created by the extra update state, but it's a bit trickier, so I leave that for a latter optimization. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	b513f0fb48	dcache: Add a dcache This replaces loadstore2 with a dcache The dcache unit is losely based on the icache one (same basic cache layout), but has some significant logic additions to deal with stores, loads with update, non-cachable accesses and other differences due to operating in the execution part of the pipeline rather than the fetch part. The cache is store-through, though a hit with an existing line will update the line rather than invalidate it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	7b3df7cb05	icache: Reduce simulation warnings This might slightly increase the logic in synthesis but avoids us looking at uninitialized tags when not servicing an active request Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	a38ae503ff	cache_ram: Add write-enables They will be needed by the dcache Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	e598188aca	plru: Improve sensitivity list Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	b963f8a6af	Merge pull request #112 from hughhalf/patch-1 Minor tweaks to README.md	7 years ago
Hugh	96b7f17e52	Minor tweaks to README.md Few tweaks based on a newcomers experience getting an Arty A7-100 up and running Forgot to add DCO in initial PR, now corrected. Signed-off-by: Hugh Blemings <hugh@blemings.org>	7 years ago
Anton Blanchard	326dec4b3b	Merge pull request #110 from antonblanchard/misc icache_tb: Improve test and include test file	7 years ago
Benjamin Herrenschmidt	f74e8a4f79	icache_tb: Improve test and include test file The icache_test.bin file was missing. This adds it (along with a python3 script to generate it). We also add better reporting on errors Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	900c131083	Merge pull request #109 from antonblanchard/misc Misc updates from Ben	7 years ago
Anton Blanchard	e67924f55e	isel takes a CR bit, not a CR field Fix a GHDL assert in isel. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Benjamin Herrenschmidt	60b05ee1e5	common: Reformat No code change Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	bddc9327cc	execute1: Remove mux on "write_data" and "rc" outputs Only "write_enable" needs to change, this shrinks the core a bit more Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	da0bd89c43	crhelpers: Constraint "crnum" integer This seems to save quite a few LUTs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	4437487ad0	execute1: Reformat No functional change Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	858b1e7930	writeback: Remove a mux leg on data_in Initialize to 0 forces the mux to have an extra leg fed with zeros. Instead initialize data_in to one of the mux inputs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	4433118c91	Merge pull request #105 from paulusmack/writeback Writeback	7 years ago
Paul Mackerras	57b200d6cb	writeback: Eliminate inferred latch This initializes data_in to all zeroes so that it doesn't become a set of 64 inferred latches. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	640af89e72	Merge pull request #106 from paulusmack/master wishbone_debug_master: Improve timing	7 years ago
Paul Mackerras	a27ed0ec27	wishbone_debug_master: Improve timing The current code has the possibility that we could set reg_addr or reg_ctrl and then increment reg_addr in the same cycle, resulting in some long timing paths. Rearrange the code to make it clear that we are not trying to add an auto-increment to data from outside the module; in any given cycle we either set one of reg_addr and reg_ctrl, or we possibly increment reg_addr. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	f49a5a99a5	Remove execute2 stage Since the condition setting got moved to writeback, execute2 does nothing aside from wasting a cycle. This removes it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	63f5dce820	Merge pull request #104 from paulusmack/master Implement neg using OP_ADD	7 years ago
Paul Mackerras	9646fe28b0	Do sign-extension instructions in writeback instead of execute1 This makes the exts[bhw] instructions do the sign extension in the writeback stage using the sign-extension logic there instead of having unique sign extension logic in execute1. This requires passing the data length and sign extend flag from decode2 down through execute1 and execute2 and into writeback. As a side bonus we reduce the number of values in insn_type_t by two. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	374f4c536d	writeback: Do data formatting and condition recording in writeback This adds code to writeback to format data and test the result against zero for the purpose of setting CR0. The data formatter is able to shift and mask by bytes and do byte reversal and sign extension. It can also put together bytes from two input doublewords to support unaligned loads (including unaligned byte-reversed loads). The data formatter starts with an 8:1 multiplexer that is able to direct any byte of the input to any byte of the output. This lets us rotate the data and simultaneously byte-reverse it. The rotated/reversed data goes to a register for the unaligned cases that overlap two doublewords. Then there is per-byte logic that does trimming, sign extension, and splicing together bytes from a previous input doubleword (stored in data_latched) and the current doubleword. Finally the 64-bit result is tested to set CR0 if rc = 1. This removes the RC logic from the execute2, multiply and divide units, and the shift/mask/byte-reverse/sign-extend logic from loadstore2. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	45271acb35	Merge pull request #103 from paulusmack/divider Divider	7 years ago
Paul Mackerras	86c53aa3f7	Implement neg using OP_ADD We have all the machinery in place to implement the neg instruction as OP_ADD. Doing that means we can ditch OP_NEG, and saves about 66 slice LUTs on the A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago

... 13 14 15 16 17 ...

1026 Commits (4b1a413a2fe7ac6657295a4a7a7d10b4596d61b4) All Branches Search

1026 Commits (4b1a413a2fe7ac6657295a4a7a7d10b4596d61b4)

All Branches