microwatt

Commit Graph

Author	SHA1	Message	Date
Anton Blanchard	bf96279ff1	Reformat countzero Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	4 years ago
Paul Mackerras	9d285a265c	core: Add support for single-precision FP loads and stores This adds code to loadstore1 to convert between single-precision and double-precision formats, and implements the lfs* and stfs* instructions. The conversion processes are described in Power ISA v3.1 Book 1 sections 4.6.2 and 4.6.3. These conversions take one cycle, so lfs* and stfs* are one cycle slower than lfd* and stfd*. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	03a3a5d326	countzero: Faster algorithm for count leading/trailing zeroes This uses an algorithm for count leading/trailing zeroes that is faster on FPGAs, which makes timing easier. cntlz* and cnttz* still take two cycles, though. For count trailing zeroes, we compute x & -x, which for non-zero x has a single 1 bit in the position of the least-significant 1 bit in x. This one-hot representation can then be converted to a bit number with six 32-input OR gates. For count leading zeroes, we simply do a bit-reversal on x and then use the same algorithm. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	e08ca4ab8e	countzero: Add a register to help make timing This adds a register in the middle of the countzero computation, so that we now have two cycles to count leading or trailing zeroes instead of just one. Execute1 now outputs a one-cycle stall signal when it encounters a cntlz* or cnttz* instruction. With this, the countzero path no longer fails timing on the Artix-7 at 100MHz. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	e527e3a9b7	countzero: Reorganize to have fewer levels of logic and fewer LUTs By using 4:1 multiplexers rather than 2:1, this cuts the number of levels of multiplexing from 4 to 2 and also reduces the total number of slice LUTs required. Because we are now handling 4 bits at each level, including the bottom level, the logic to do the priority encoding can be factored out into a function that is used at each level. This rearranges the logic so that the encoding and selection of bits is done whether or not the input operand is zero, and the if statement testing whether the input is zero only affects what is assigned to result. With this we don't get the inferred latches and we can go back to using signals rather than variables. Also add some comments about what is being done. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Anton Blanchard	1b559aee31	Fix count-leading/trailing-zeroes The current code simulates correctly, but produces miscompares when synthesized onto an FPGA. On closer inspection GHDL synthesis complains about inferred latches and there does seem to be issues. Convert it to variables that are always initialized to zero at the start of the process. Fixes: `24a4a796ce` ("execute: Consolidate count-leading/trailing-zeroes implementations") Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Paul Mackerras	24a4a796ce	execute: Consolidate count-leading/trailing-zeroes implementations This adds combinatorial logic that does 32-bit and 64-bit count leading and trailing zeroes in one unit, and consolidates the four instructions under a single OP_CNTZ opcode. This saves 84 slice LUTs on the Arty A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago

7 Commits (b885ee7ed1de523fa5c5e315cbab92abe29646c4)