microwatt

Commit Graph

Author	SHA1	Message	Date
Anton Blanchard	f77b31a552	Merge pull request #134 from paulusmack/master Add bypass from execute1 output to input	6 years ago
Anton Blanchard	c18830a5e5	Add an option to use Docker Some distros don't have a version of ghdl with the LLVM or GCC backend, so add a Docker image as an alternative. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	a4dbbfda4a	Fix Makefile dependency issue with files in vhdl/* GHDL doesn't seem to have a way to specify the location of the object file it writes, so right now they are all ending up in the root directory. The Makefile rules did not reflect that, so make would continually the files in fpga/* Fix the rules to match what GHDL is doing. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Paul Mackerras	39d18d2738	Make divider hang off the side of execute1 With this, the divider is a unit that execute1 sends operands to and which sends its results back to execute1, which then send them to writeback. Execute1 now sends a stall signal when it gets a divide or modulus instruction until it gets a valid signal back from the divider. Divide and modulus instructions are no longer marked as single-issue. The data formatting step that used to be done in decode2 for div and mod instructions is now done in execute1. We also do the absolute value operation in that same cycle instead of taking an extra cycle inside the divider for signed operations with a negative operand. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	2167186b5f	Make multiplier hang off the side of execute1 With this, the multiplier isn't a separate pipe that decode2 issues instructions to, but rather is a unit that execute1 sends operands to and which sends the result back to execute1, which then sends it to writeback. Execute1 now sends a stall signal when it gets a multiply instruction until it gets a valid signal back from the multiplier. This all means that we no longer need to mark the multiply instructions as single-issue. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	e4f475e17f	sprs: Store common SPRs in register file This stores the most common SPRs in the register file. This includes CTR and LR and a not yet final list of others. The register file is set to 64 entries for now. Specific types are defined that can represent a GPR index (gpr_index_t) or a GPR/SPR index (gspr_index_t) along with conversion functions between the two. On order to deal with some forms of branch updating both LR and CTR, we introduced a delayed update of LR after a branch link. Note: We currently stall the pipeline on such a delayed branch, but we could avoid stalling fetch in that specific case as we know we have a branch delay. We could also limit that to the specific case where we need to update both CTR and LR. This allows us to make bcreg, mtspr and mfspr pipelined. decode1 will automatically force the single issue flag on mfspr/mtspr to a "slow" SPR. [paulus@ozlabs.org - fix direction of decode2.stall_in] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	8e0389b973	ram: Rework main RAM interface This replaces the simple_ram_behavioural and mw_soc_memory modules with a common wishbone_bram_wrapper.vhdl that interfaces the pipelined WB with a lower-level RAM module, along with an FPGA and a sim variants of the latter. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	9a63c098a5	Move log2/ispow2 to a utils package (Out of icache and dcache) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	cb4451498f	dcache: Add testbench A very simple one for now... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	b513f0fb48	dcache: Add a dcache This replaces loadstore2 with a dcache The dcache unit is losely based on the icache one (same basic cache layout), but has some significant logic additions to deal with stores, loads with update, non-cachable accesses and other differences due to operating in the execution part of the pipeline rather than the fetch part. The cache is store-through, though a hit with an existing line will update the line rather than invalidate it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Paul Mackerras	f49a5a99a5	Remove execute2 stage Since the condition setting got moved to writeback, execute2 does nothing aside from wasting a cycle. This removes it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	374f4c536d	writeback: Do data formatting and condition recording in writeback This adds code to writeback to format data and test the result against zero for the purpose of setting CR0. The data formatter is able to shift and mask by bytes and do byte reversal and sign extension. It can also put together bytes from two input doublewords to support unaligned loads (including unaligned byte-reversed loads). The data formatter starts with an 8:1 multiplexer that is able to direct any byte of the input to any byte of the output. This lets us rotate the data and simultaneously byte-reverse it. The rotated/reversed data goes to a register for the unaligned cases that overlap two doublewords. Then there is per-byte logic that does trimming, sign extension, and splicing together bytes from a previous input doubleword (stored in data_latched) and the current doubleword. Finally the 64-bit result is tested to set CR0 if rc = 1. This removes the RC logic from the execute2, multiply and divide units, and the shift/mask/byte-reverse/sign-extend logic from loadstore2. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	813f834012	Add CR hazard detection To keep things simple we treat the CR as a single entity. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	bdc26b7527	Add GPR hazard detection Check GPRs against any writers in the pipeline. All instructions are still marked single in pipeline at this stage. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	e4c98dce36	Merge pull request #100 from antonblanchard/gpr-hazard-5-a Separate issue control into its own unit	7 years ago
Anton Blanchard	d5346d0abf	Separate issue control into its own unit Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Paul Mackerras	4396eddc31	countzero: Add a testbench Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	3c6e66dc96	Merge pull request #83 from paulusmack/logical execute: Consolidate count-leading/trailing-zeroes implementations	7 years ago
Anton Blanchard	4b7b702e01	Merge pull request #81 from antonblanchard/logical Consolidate logical instructions	7 years ago
Paul Mackerras	24a4a796ce	execute: Consolidate count-leading/trailing-zeroes implementations This adds combinatorial logic that does 32-bit and 64-bit count leading and trailing zeroes in one unit, and consolidates the four instructions under a single OP_CNTZ opcode. This saves 84 slice LUTs on the Arty A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	b8fb721b81	Consolidate logical instructions Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common invert on the input and output. This saves us about 200 LUTs. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Benjamin Herrenschmidt	b56b46b7d1	icache: Set associative icache This adds support for set associativity to the icache. It can still be direct mapped by setting NUM_WAYS to 1. The replacement policy uses a simple tree-PLRU for each set. This is only lightly tested, tests pass but I have to double check that we are using the ways effectively and not creating duplicates. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	004eb074c9	plru: Add a simple PLRU module Tested in sim only for now Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Paul Mackerras	f7c393ba7e	Add a rotate/mask/shift unit and use it in execute1 This adds a new entity 'rotator' which contains combinatorial logic for rotating and masking 64-bit values. It implements the operations of the rlwinm, rlwnm, rlwimi, rldicl, rldicr, rldic, rldimi, rldcl, rldcr, sld, slw, srd, srw, srad, sradi, sraw and srawi instructions. It consists of a 3-stage 64-bit rotator using 4:1 multiplexors at each stage, two mask generators, output logic and control logic. The insn_type_t values used for these instructions have been reduced to just 5: OP_RLC, OP_RLCL and OP_RLCR for the rotate and mask instructions (clear both left and right, clear left, clear right variants), OP_SHL for left shifts, and OP_SHR for right shifts. The control signals for the rotator are derived from the opcode and from the is_32bit and is_signed fields of the decode_rom_t. The rotator is instantiated as an entity in execute1 so that we can be sure we only have one of it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	c9e92483b8	decode: Push mtspr/mfspr register decoding down into execute1 Instead of doing mfctr, mflr, mftb, mtctr, mtlr as separate ops, just pass down mfspr and mtspr ops with the spr number and let execute1 decode which SPR we're addressing. This will help reduce the number of instruction bits decode1 needs to look at. In fact we now pass down the whole instruction from decode2 to execute1. We will need more bits of the instruction in future, and the tools should just optimize away any that we don't end up using. Since the 'aa' bit was just a copy of an instruction bit, we can now remove it from the record. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Benjamin Herrenschmidt	586abb70a0	Update dependency Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	26f70264b3	Update Makefile dependencies Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	b57325ce29	Merge branch 'divider' of https://github.com/paulusmack/microwatt	7 years ago
Paul Mackerras	d5bc6c8824	Add a divider unit and a testbench for it This adds a divider unit, connected to the core in much the same way that the multiplier unit is connected. The division algorithm is very simple-minded, taking 64 clock cycles for any division (even 32-bit division instructions). The decoding is simplified by making use of regularities in the instruction encoding for div* and mod* instructions. Instead of having PPC_* encodings from the first-stage decoder for each of the different div* and mod* instructions, we now just have PPC_DIV and PPC_MOD, and the inputs to the divider that indicate what sort of division operation to do are derived from instruction word bits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Benjamin Herrenschmidt	42d802bed0	Add distclean to Makefile Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	98f0994698	Add core debug module This module adds some simple core controls: reset, stop, start, step along with icache clear and reading the NIA and core status bits Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org	7 years ago
Benjamin Herrenschmidt	b46f81fae4	Wishbone debug module This adds a debug module off the DMI (debug) bus which can act as a wishbone master to generate read and write cycles. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	ee52fd4d80	Add a debug (DMI) bus and a JTAG interface to it on Xilinx FPGAs This adds a simple bus that can be mastered from an external system via JTAG, which will be used to hookup various debug modules. It's loosely based on the RiscV model (hence the DMI name). The module currently only supports hooking up to a Xilinx BSCANE2 but it shouldn't be too hard to adapt it to support different TAPs if necessary. The JTAG protocol proper is not exactly the RiscV one at this point, though I might still change it. This comes with some sim variants of Xilinx BSCANE2 and BUFG and a test bench. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	135805d2ac	Merge pull request #61 from antonblanchard/execute-cleanup execute1 no longer needs sim_console	7 years ago
Anton Blanchard	6d85920068	execute1 no longer needs sim_console Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	1b6eef2a5d	Fix multiply_tb Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	1e3e16e500	Add an icache testbench Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	89849a6856	Add a simple direct mapped icache Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	b6b2c78163	Update Makefile dependencies Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	3ac1dbc737	Share soc.vhdl between FPGA and sim Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	8bfd6e5eae	Use simulated UART in core test bench Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	03fd06deaf	Rework SOC reset The old reset code was overly complicated and never worked properly. Replace it with a simpler sequence that uses a couple of shift registers to assert resets: - Wait a number of external clock cycles before removing reset from the PLL. - After the PLL locks and the external reset button isn't pressed, wait a number of PLL clock cycles before removing reset from the SOC. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	5e140298a5	Rework decode2 The decode2 stage was spaghetti code and needed cleaning up. Create a series of functions to pull fields from a ppc instruction and also a series of helpers to extract values for the execution units. As suggested by Paul, we should pass all signals to the execution units and only set the valid signal conditionally, which should use less resources. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	f98370f9e6	Merge pull request #5 from antonblanchard/travis-test Add an initial travis.yml	7 years ago
Anton Blanchard	2ee269abdb	Add an initial travis.yml Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	96787091a6	Add -Wall to CFLAGS Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	5a29cb4699	Initial import of microwatt Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago

1 2

97 Commits (5cc5d8f030d303a82ed38f3b359921a748b746ba)