microwatt

Commit Graph

Author	SHA1	Message	Date
Benjamin Herrenschmidt	a0d95e791e	insn: Implement isync instruction The instruction works by redirecting fetch to nia+4 (hopefully using the same adder used to generate LR) and doing a backflush. Along with being single issue, this should guarantee that the next instruction only gets fetched after the pipe's been emptied. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	6e0ee0b0db	icache & dcache: Fix store way variable We used the variable "way" in the wrong state in the cache when updating a line valid bit after the end of the wishbone transactions, we need to use the latched "store_way". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	587a5e3c45	dcache: Cleanup (mostly cosmetic) Clearly separate the 2 stages of load hits, improve naming and comments, clarify the writeback controls etc... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	265fbf894b	icache/dcache: Make both caches 32 lines, 2 ways Adding lines seems to add only little extra as the BRAMs aren't full, 2 ways is our current comprimise to limit pressure on small FPGAs. We could go to 64 lines for a little more, but timing is becoming a bit too right to my linking on the tags/LRU path of the icache, so let's leave it at 32 for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	174378b190	dcache: Introduce an extra cycle latency to make timing This makes the BRAMs use an output buffer, introducing an extra cycle latency. Without this, Vivado won't make timing at 100Mhz. We stash all the necessary response data in delayed latches, the extra cycle is NOT a state in the state machine, thus it's fully pipelined and doesn't involve stalling. This introduces an extra non-pipelined cycle for loads with update to avoid collision on the writeback output between the now delayed load data and the register update. We could avoid it by moving the register update in the pipeline bubble created by the extra update state, but it's a bit trickier, so I leave that for a latter optimization. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	b513f0fb48	dcache: Add a dcache This replaces loadstore2 with a dcache The dcache unit is losely based on the icache one (same basic cache layout), but has some significant logic additions to deal with stores, loads with update, non-cachable accesses and other differences due to operating in the execution part of the pipeline rather than the fetch part. The cache is store-through, though a hit with an existing line will update the line rather than invalidate it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	7b3df7cb05	icache: Reduce simulation warnings This might slightly increase the logic in synthesis but avoids us looking at uninitialized tags when not servicing an active request Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	a38ae503ff	cache_ram: Add write-enables They will be needed by the dcache Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	e598188aca	plru: Improve sensitivity list Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	b963f8a6af	Merge pull request #112 from hughhalf/patch-1 Minor tweaks to README.md	7 years ago
Hugh	96b7f17e52	Minor tweaks to README.md Few tweaks based on a newcomers experience getting an Arty A7-100 up and running Forgot to add DCO in initial PR, now corrected. Signed-off-by: Hugh Blemings <hugh@blemings.org>	7 years ago
Anton Blanchard	326dec4b3b	Merge pull request #110 from antonblanchard/misc icache_tb: Improve test and include test file	7 years ago
Benjamin Herrenschmidt	f74e8a4f79	icache_tb: Improve test and include test file The icache_test.bin file was missing. This adds it (along with a python3 script to generate it). We also add better reporting on errors Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	900c131083	Merge pull request #109 from antonblanchard/misc Misc updates from Ben	7 years ago
Anton Blanchard	e67924f55e	isel takes a CR bit, not a CR field Fix a GHDL assert in isel. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Benjamin Herrenschmidt	60b05ee1e5	common: Reformat No code change Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	bddc9327cc	execute1: Remove mux on "write_data" and "rc" outputs Only "write_enable" needs to change, this shrinks the core a bit more Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	da0bd89c43	crhelpers: Constraint "crnum" integer This seems to save quite a few LUTs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	4437487ad0	execute1: Reformat No functional change Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	858b1e7930	writeback: Remove a mux leg on data_in Initialize to 0 forces the mux to have an extra leg fed with zeros. Instead initialize data_in to one of the mux inputs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	4433118c91	Merge pull request #105 from paulusmack/writeback Writeback	7 years ago
Paul Mackerras	57b200d6cb	writeback: Eliminate inferred latch This initializes data_in to all zeroes so that it doesn't become a set of 64 inferred latches. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	640af89e72	Merge pull request #106 from paulusmack/master wishbone_debug_master: Improve timing	7 years ago
Paul Mackerras	a27ed0ec27	wishbone_debug_master: Improve timing The current code has the possibility that we could set reg_addr or reg_ctrl and then increment reg_addr in the same cycle, resulting in some long timing paths. Rearrange the code to make it clear that we are not trying to add an auto-increment to data from outside the module; in any given cycle we either set one of reg_addr and reg_ctrl, or we possibly increment reg_addr. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	f49a5a99a5	Remove execute2 stage Since the condition setting got moved to writeback, execute2 does nothing aside from wasting a cycle. This removes it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	63f5dce820	Merge pull request #104 from paulusmack/master Implement neg using OP_ADD	7 years ago
Paul Mackerras	9646fe28b0	Do sign-extension instructions in writeback instead of execute1 This makes the exts[bhw] instructions do the sign extension in the writeback stage using the sign-extension logic there instead of having unique sign extension logic in execute1. This requires passing the data length and sign extend flag from decode2 down through execute1 and execute2 and into writeback. As a side bonus we reduce the number of values in insn_type_t by two. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	374f4c536d	writeback: Do data formatting and condition recording in writeback This adds code to writeback to format data and test the result against zero for the purpose of setting CR0. The data formatter is able to shift and mask by bytes and do byte reversal and sign extension. It can also put together bytes from two input doublewords to support unaligned loads (including unaligned byte-reversed loads). The data formatter starts with an 8:1 multiplexer that is able to direct any byte of the input to any byte of the output. This lets us rotate the data and simultaneously byte-reverse it. The rotated/reversed data goes to a register for the unaligned cases that overlap two doublewords. Then there is per-byte logic that does trimming, sign extension, and splicing together bytes from a previous input doubleword (stored in data_latched) and the current doubleword. Finally the 64-bit result is tested to set CR0 if rc = 1. This removes the RC logic from the execute2, multiply and divide units, and the shift/mask/byte-reverse/sign-extend logic from loadstore2. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	45271acb35	Merge pull request #103 from paulusmack/divider Divider	7 years ago
Paul Mackerras	86c53aa3f7	Implement neg using OP_ADD We have all the machinery in place to implement the neg instruction as OP_ADD. Doing that means we can ditch OP_NEG, and saves about 66 slice LUTs on the A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	82c19d4e7a	divider: Reduce delay in detecting 32-bit overflow Timing analysis showed that even with the output register, timing was still a bit tight in the output stage, where the carry has to propagate all the way through the 64-bit negater, and we were then testing the top 33 bits to determine if a 32-bit operation had overflowed. Instead of detecting overflow at the end, we watch for any 1 bits getting shifted into the top 32 bits of the quotient register as we are doing the division. That is relatively easy to do and simplifies the output stage. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	6c4edf80ae	Merge pull request #102 from antonblanchard/gpr-hazard-5-c Add CR hazard detection	7 years ago
Anton Blanchard	813f834012	Add CR hazard detection To keep things simple we treat the CR as a single entity. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	58b348deae	Merge pull request #101 from antonblanchard/gpr-hazard-5-b Add GPR hazard detection	7 years ago
Paul Mackerras	c7025f9f28	divider: Add an output register This puts the output of the divider through a register. With the addition of the logic to detect overflow, the combinatorial output logic of the divider was becoming a critical path. Adding the output register adds a cycle to the latency of the divider but helps make timing at 100MHz on the A7-100. This also makes the valid, write_reg_enable and write_cr_enable fields of the output be registered, which eliminates warnings about register/latch pins with no clock. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	bb65d0b899	Remove issue restrictions on a number of instructions Anything that isn't a load or store and anything that doesn't read the CR can go as soon as its inputs are ready. While we could also allow SPR read/write and carry read/write, we plan to change them to be read in decode2 and written in writeback soon and they will need separate hazard detection to be added. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	bdc26b7527	Add GPR hazard detection Check GPRs against any writers in the pipeline. All instructions are still marked single in pipeline at this stage. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	e4c98dce36	Merge pull request #100 from antonblanchard/gpr-hazard-5-a Separate issue control into its own unit	7 years ago
Anton Blanchard	f181bf31e2	Merge pull request #99 from paulusmack/logical Logical	7 years ago
Anton Blanchard	d5346d0abf	Separate issue control into its own unit Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Paul Mackerras	4396eddc31	countzero: Add a testbench Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	e527e3a9b7	countzero: Reorganize to have fewer levels of logic and fewer LUTs By using 4:1 multiplexers rather than 2:1, this cuts the number of levels of multiplexing from 4 to 2 and also reduces the total number of slice LUTs required. Because we are now handling 4 bits at each level, including the bottom level, the logic to do the priority encoding can be factored out into a function that is used at each level. This rearranges the logic so that the encoding and selection of bits is done whether or not the input operand is zero, and the if statement testing whether the input is zero only affects what is assigned to result. With this we don't get the inferred latches and we can go back to using signals rather than variables. Also add some comments about what is being done. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	0a0fe03767	Merge pull request #98 from antonblanchard/fix-mod mod* doesn't have an RC form	7 years ago
Anton Blanchard	10a990bba8	mod* doesn't have an RC form The RC bit should be ignored for mod* instructions. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	56908edea2	Merge pull request #96 from antonblanchard/clk_gen_bypass-fix Fix clk_gen_bypass	7 years ago
Anton Blanchard	6cdb8ca9f5	Fix clk_gen_bypass clk_gen_bypass needed updating after the addition of CLK_INPUT_HZ and CLK_OUTPUT_HZ. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	8530500a71	Merge pull request #94 from antonblanchard/icbi-nop decode: Handle icbi	7 years ago
Anton Blanchard	854c93f970	Merge pull request #93 from antonblanchard/fifo-fix Remove shared variable from fifo, and reformat	7 years ago
Anton Blanchard	c41da84226	decode: Handle icbi We will need a proper handler for icbi, but in the meantime treat it as a nop. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	7aaed5abd5	fifo: Reformat Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago

... 7 8 9 10 11 ...

706 Commits (8bfc6a21b97fea937058491362b0f173cc2e62f8) All Branches Search

706 Commits (8bfc6a21b97fea937058491362b0f173cc2e62f8)

All Branches