microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	96b402a4bf	Consolidate add/subtract instructions into a single op All of the PPC add and subtract instructions, including carrying and extended versions, do much the same arithmetic operation: result = (I xor A) + B + C where A is the value from RA, I provides a logical inversion of A (i.e. I is 0 or -1), B is either from RB or is a constant 0 or -1, and C is 0, 1 or the carry bit from XER (CA). To consolidate all the add/subtract instructions into a single OP_ADD, we add a column to decode_rom_t to indicate when A should be inverted, and change the input_carry field to a 3-state selector to select C in the equation above. This also adds a new "CONST_M1" value for input_reg_b_t to indicate that B is a constant -1. This allows us to implement addme and subfme. The addex instruction appears not to exist, so the comments referring to it are removed. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	58b06eb5f3	decode: Remove const fields from decode_rom_t The const* fields of decode_rom_t drove multiplexers in decode2 that picked out various instruction fields and put them into the const* fields of the Decode2ToExecute1Type record, from where they were used in execute1. However, the code in execute1 can just as easily use the appropriate fields of the original instruction word, since that is now available in execute1. This therefore changes the code to do that, resulting in smaller decode tables. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	c9e92483b8	decode: Push mtspr/mfspr register decoding down into execute1 Instead of doing mfctr, mflr, mftb, mtctr, mtlr as separate ops, just pass down mfspr and mtspr ops with the spr number and let execute1 decode which SPR we're addressing. This will help reduce the number of instruction bits decode1 needs to look at. In fact we now pass down the whole instruction from decode2 to execute1. We will need more bits of the instruction in future, and the tools should just optimize away any that we don't end up using. Since the 'aa' bit was just a copy of an instruction bit, we can now remove it from the record. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Benjamin Herrenschmidt	3e6f656a90	Add MCRF instruction Hopefully it's not too timing catastrophic. The variable newcrf will be handy for the other CR ops when we implement them I suspect. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	554ae88540	Implement absolute branches Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Paul Mackerras	25b9450475	divider: Do absolute-value ops in divider instead of decode This moves the negation of negative operands for signed divide and modulus operations out of the decode2 stage and into the divider. If either of the operands for a signed divide or modulus operation is negative, the divider now takes an extra cycle to negate the operands that are negative. The interface to the divider now has an 'is_signed' signal rather than a 'neg_result' signal, and the dividend and divisor can be negative, so divider_tb had to be updated for the new interface. The reason for doing this is that one of the worst timing violations on the Arty A7-100 at 100MHz involved the carry chain in the adders that did the negation of the dividend and divisor in the decode stage. Moving the negations to a separate cycle fixes that and also seems to reduce the total number of slice LUTs used. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Anton Blanchard	b57325ce29	Merge branch 'divider' of https://github.com/paulusmack/microwatt	5 years ago
Paul Mackerras	d5bc6c8824	Add a divider unit and a testbench for it This adds a divider unit, connected to the core in much the same way that the multiplier unit is connected. The division algorithm is very simple-minded, taking 64 clock cycles for any division (even 32-bit division instructions). The decoding is simplified by making use of regularities in the instruction encoding for div* and mod* instructions. Instead of having PPC_* encodings from the first-stage decoder for each of the different div* and mod* instructions, we now just have PPC_DIV and PPC_MOD, and the inputs to the divider that indicate what sort of division operation to do are derived from instruction word bits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Benjamin Herrenschmidt	98f0994698	Add core debug module This module adds some simple core controls: reset, stop, start, step along with icache clear and reading the NIA and core status bits Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org	5 years ago
Anton Blanchard	d813ffb748	Fix spurious outstanding assert Check it in the sequential process, not the combinatorial one. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	80aa781454	Merge pull request #47 from antonblanchard/if-fix Explicitly check against '1' in if statements	5 years ago
Anton Blanchard	b9e28598b4	Explicitly check against '1' in if statements nvc doesn't like what I think is a VHDL 2008 construct. Lets just check against '1' explicitly. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	1d00c75ecc	Remove nia from loadstore and multiply Neither unit needs the NIA, so remove it. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	a8f8c54b77	Move debug execute output into decode2 This covers all units, and we avoid double printing. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	92a7152370	Rework pipeline, add stall and flush signals This adds stall and flush signals to the pipeline. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	a1ab1d3e56	Merge pull request #25 from antonblanchard/register_file_printing Clean up register read debug output	5 years ago
Anton Blanchard	04eb9583e6	Clean up register read debug output Right now we continually print all 3 possible GPRs an instruction may be using. Add signals so we only print GPRs when they are actually read. This should hopefully optimise away when synthesized. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	9fbaea6f08	Rework CR file and add forwarding Handle the CR as a single field with per nibble enables. Forward any writes in the same cycle. If this proves to be an issue for timing, we may want to revisit this in the future. For now, it keeps things simple. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	5e140298a5	Rework decode2 The decode2 stage was spaghetti code and needed cleaning up. Create a series of functions to pull fields from a ppc instruction and also a series of helpers to extract values for the execution units. As suggested by Paul, we should pass all signals to the execution units and only set the valid signal conditionally, which should use less resources. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	5a29cb4699	Initial import of microwatt Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago

1 2

70 Commits (faab169307285af3e74b6b817b2fa0de480346d3)