microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	3268ef717c	FPU: Make opsel_a a function of just the state This adds some extra states and transitions so that opsel_a becomes a function only of the current state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	73505b1626	FPU: Provide a separate path for transferring A/B/C to R The timing path from r.a.class to result showed up as a critical path on the Artix-7, apparently because of transfers of A, B or C to R in special cases (e.g. NaN inputs) and the fsel instruction. To alleviate this, we provide a path via the miscellaneous value multiplexer from A, B and C to R, selected via opsel_R = RES_MISC and misc_sel = 111. A new selector opsel_sel selects which of A, B or C to transfer, using the same encoding as opsel_a. This new selector is now also used for the result class when rcls_op = RCLS_SEL and for the result sign when rsgn_op = RSGN_SEL. This reduces the number of things that opsel_a depends on and eases timing in the main adder path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	b63773f6e9	FPU: Move computation of main adder inputs out of the state machine Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	b4aae8511d	FPU: Move special case handling to a separate process This creates a new fpu_specialcases process that handles most of the logic that was previously in the DO_NAN_INF and DO_ZERO_DEN states. What remains of those states, i.e. the handling of denormalized inputs, is in a new DO_SPECIAL state. The state machine goes into DO_SPECIAL state after IDLE for any arithmetic operation where an input is a NaN, infinity, zero or denormalized value. Doing this means that the rest of the state machine won't try to start any computation which would need to be overridden by the logic to produce the result value selected by the fpu_specialcases process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	b1bd2aa865	FPU: Make set_r independent of multiply_to_f.valid Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	fcfdbc449c	FPU: Move condition register calculations to an explicit data path Instead of calculating v.cr_result in the state machine, we now have the state machine set a 'cr_op' variable which then controls what computation the CR data path does to set v.cr_result. The CR data path also handles updating the XERC result bits for integer operations (division and modulus). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	bbc485f336	FPU: Rework inputs to the main adder With this, the A input no longer has R as an option but now takes the rounding constants and the low-order bits of P (used as an adjustment in the square root algorithm). The B input has either R or zero. Both inputs can be optionally inverted for subtraction. The select inputs to the multiplexers now have 3 bits in opsel_a and 1 bit in opsel_b. The states which need R to be set now explicitly have set_r := 1 even though that is the default, essentially for documentation reasons. Similarly some states set opsel_b <= BIN_R even though that is the default. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	0e7c11a0e4	FPU: Move result_class logic outside of state machine The various states choose one of four operations (including no-op) to be done on result_class. Some operations have side-effects on arith_done or FPSCR. The DO_NAN_INF and DO_ZERO_DEN states still set result_class directly since their logic is expected to move out to a separate process later. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	5f0b2d433d	FPU: Simplify calculation of result_class For the various arithmetic operators, we only get to the DO_* states when the inputs are finite (not zero, infinity or NaN), so we can replace setting of v.result_class to r.a.class or r.b.class with a overall setting of it to FINITE in cycle 1 of all those operations. Also, integer division doesn't need to set the result class since the result is integer. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	70819c4c39	FPU: Do renormalization from DO_ZERO_DEN state Instead of having the various DO_* states (DO_FMUL, DO_FDIV, etc.) handle checking for denormalized inputs, we now have DO_ZERO_DEN state check for denormalized inputs and branch to RENORM_{A,B,C} to handle them. This also meant some changes were needed in how fsqrt and frsqrte handled inputs with odd exponent. The DO_FSQRT and DO_FRSQRTE states were very similar and have been combined into one. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	8648ddb64f	FPU: Eliminate EXC_RESULT state This lets us remove r.opsel_a and is a step towards moving the handling of exceptional cases out to a separate process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	850b87c83f	FPU: Get rid of r.madd_cmp and r.exp_cmp This saves a few LUTs and simplifies the code a little. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	ba2add029a	FPU: Remove need to set opsel_a one cycle ahead Most states set opsel_a directly to select the operand for the A input of the main adder. The exception is the EXC_RESULT state, which uses r.opsel_a set by the previous cycle to indicate which input operand to use as the result. In order to make timing, ensure that the controls that select the inputs to the main adder (opsel_*, etc.) don't depend on any complicated functions of the data (such as px_nz, pcmpb_eq, pcmpb_lt, etc.), but are as far as possible constant for each state. There is now a control called set_r for whether the result is written to r.r, which enables us to avoid setting opsel_b or opsel_r conditionally in some cases. Also, to avoid a data-dependent setting of msel_2 in IDIV_DODIV state, the IDIV_NR1 and IDIV_NR2 states have been reworked so that completion of the required number of iterations is checked in IDIV_NR1 state, and at that point, if the inverse estimate is < 0.5, we go to IDIV_USE0_5 state in order to use 0.5 as the estimate. This means that in the normal case, the inverse estimate is already in Y when we get to IDIV_DODIV state. IDIV_USE0_5 has been reworked to put R (which will contain 0.5) into Y as the inverse estimate. That means that IDIV_DODIV state doesn't have any data-dependent logic to put either P or R into Y. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	2731384a4b	FPU: Reduce misc_sel to 3 bits Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	cf866ce910	FPU: Simplify logic for setting r.x Since r.x is mostly set from the value in r.r and only once from anything else (r.b.mantissa), move the check to before the input multiplexer for the main adder, so it works on r.r rather than whatever is selected by r.opsel_a. For the case in DO_FRSP where we have B selected by r.opsel_a, we add a new state so that we now get B into R and then check the low bits of R. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	4e5f856c55	FPU: Factor out some of the common elements of the DO_* states Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	2422585e14	FPU: Reduce use of r.insn inside the state machine Instead use things derived from the instruction in the first cycle, such as r.is_multiply, r.is_addition, etc. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	7812a55b6c	FPU: Reorganize NaN and infinity handling and improve arch compliance The architecture specifies that an invalid operation exception for signalling NaN (VXSNAN) can occur in the same instructions as an invalid operation exception for infinity times zero (VXIMZ) in the case of a multiply-add instruction where B is a signalling NaN, and one of A and C is infinity and the other is zero. This moves the invalid operation tests around so as to handle this case correctly. It also restructures the infinity and NaN cases to simplify the logic a little. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	9ac71cfbf2	tests/fpu: Add more floating multiply-add tests Add more tests to check that the result sign computations are correct. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	a3613d863b	FPU: Simplify sign calculation in FP multiply-add instructions By starting out with result_sign = +/- sign of B, we avoid the need to flip the result sign in a few places. This also simplifies DO_FMADD state a bit by having DO_ZERO_DEN go to DO_FMUL state for floating multiply-add where B is zero. (The RENORM_A2 and RENORM_C2 states already do this.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	707dd619a0	FPU: Move NaN/infinity and zero/denorm handling out to separate states This should simplify the DO_* states and hopefully be simpler overall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	27b3e42353	FPU: Move result_sign computations from state machine to a data path Instead of operating on result_sign directly, the state machine now sets a control variable "rsgn_op" that then directs a tiny ALU to do what's required. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	71b7df679b	FPU: Calculate quieten_nan in first cycle Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	955fa561fb	FPU: Move most result_sign computation out of state machine This moves the computation of r.result_sign out of the various states for most instructions. Now the sign is mostly computed in the first cycle (when e_in.valid is true). The set of operations done on r.result_sign in the state machine are now restricted to 5 (other than no change): invert, xor with r.is_subtract, or set to the sign of A, B or C. Similarly r.is_subtract and r.negate are computed in the first cycle now. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	c5abe3c0a9	Merge pull request #440 from paulusmack/compliance More compliance improvements - [H]DEXCR, no-op SPRs, writable TB	5 months ago
Paul Mackerras	413907e4bc	soc: Move timebase back into the core and enable writing to it Instead of a single global timebase register in the SoC, we now have a timebase counter in each core; however, now they are only reset by the soc reset, not the core reset. Thus they stay in sync even when some cores are disabled (via the syscon cpu_ctrl register). This implements mtspr to the TBLW and TBUW SPRs, which write the lower and upper 32 bits of this core's timebase, respectively. In order to fulfil the ISA's requirements that (a) some method for getting the timebases into sync and (b) some method for preventing userspace from reading the timebase be provided by the platform, this adds a syscon register TB_CTRL with two read/write bits implemented; bit 0 freezes all the timebases in the system when set, and bit 1 makes reading the timebase privileged (in all cores). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	f705fc5e19	core: Implement reserved/no-op SPR numbers SPR numbers 808 - 811 do nothing when read or written, that is, mfspr doesn't modify the destination register. This is accomplished in the same way that privileged mfspr to an unimplemented SPR is made a no-op, by supplying the old contents of the destination register as an input and writing that same value back. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	c49c32b5fe	core: Implement DEXCR and HDEXCR registers Of the defined aspect bits (which are all read-write), only the NPHIE and PHIE bits have any function at all, since Microwatt is an in-order single-issue machine and never does any branch speculation. Also, since there is no privileged non-hypervisor mode, the high 32 bits of DEXCR do nothing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	bae24b12e7	Merge pull request #439 from paulusmack/master Update LiteX code for ethernet, SD card and DRAM	5 months ago
Paul Mackerras	3e0888ae35	litesdcard: Update generated code Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	3fb0a9ed26	litedram: Update generated code Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	ab7105f438	liteeth: Update generated code Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	370dbef593	Merge pull request #438 from paulusmack/master Improve timing and utilization, remove warnings	5 months ago
Paul Mackerras	f0c331b8b8	Arty A7: Reduce warnings from Vivado Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	1395bde3cc	core: Store hash key SPRs in the SPR RAM This moves HASHKEYR and HASHPKEYR to the SPR RAM that also stores things such as SRR0/1, LR and CTR. For hashst[p] and hashchk[p] instructions, execute1 reads the relevant key register from the RAM and sends it to loadstore1. This saves several LUTs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	2c7d1e5d9c	decode: Split input B selection into two fields Instead of a single input_reg_b_t field in the decode table which select both whether input B is a register or constant, and also which constant (immediate value) to use, we now have one field which selects whether input B is immediate (constant), a GPR, or an FPR, and a separate field to select which sort of immediate value to use. This results in simpler logic and better timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	e4e1a033bd	Merge pull request #437 from paulusmack/compliance Implement fixed-point hash instructions	6 months ago
Paul Mackerras	8f537c13bc	tests: Add a test for the hash instructions hash{st,cmp}[p] Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	3bcc31fdda	core: Implement hashstp and hashchkp instructions and HASHPKEYR register These provide facilities similar to hashstp, hashchk and HASHKEYR, but restricted to privileged mode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	00a3db8457	decode1: Indicate instruction privilege in main decode table Previously the computation of whether an instruction is privileged or not was done based on the insn_type. However, that meant that lcix (OP_LOAD) and stcix (OP_STORE) couldn't be made privileged, and neither could tlbsync (OP_NOP). Instead, this adds a field to the main instruction decode table to indicate privileged instructions, and makes the cache-inhibited loads and stores privileged, along with tlbsync. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	0a11e8455f	core: Implement hashst and hashchk instructions These are done in loadstore1. The HashDigest function is computed in 9 cycles; for 8 cycles, a state machine does 4 steps of key expansion per cycle, and for each of 4 lanes of data, does 4 steps of ciphering; then there is 1 cycle to combine the results into the final hash value. At present, hashcmp does not overlap the computation of the hash with fetching of data from memory (in the case of a cache miss). The 'is_signed' field in the instruction decode table is used to distinguish hashst and hashcmp from ordinary loads and stores. We have a new 'RBC' value for input_reg_c_t which says that we are reading RB but we want the value to come in via the C port; this is because we want the 5-bit immediate offset on the B port. Note that in the list of insn_code values, hashst/chk have been put in the section for instructions with an RB operand, which is not strictly correct given that the B port is used for the immediate D operand; however, adding them to the section for instructions without an RB operand would have made that section exceed 128 entries, causing changes to the padding needed. The only downside to having hashst/cmp where they are is that the debug logic can't use the RB port to read GPR/FPRs when a hashst/cmp instruction is being decoded. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	e9b57ca5bf	Merge pull request #436 from paulusmack/smp Implement SMP	6 months ago
Paul Mackerras	0a2d3b6f58	loadstore1: Split DAWR check across a clock edge Instead of doing the address subtractions and subsequent logic for DAWR hit detection in the second cycle of a load or store, this does the subtractions in the first cycle and the remaining logic in the second cycle. This improves timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	d8423568b6	core: Evaluate rotator control signals in decode2 Hopefully this improves timing a bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	d1c7b654bb	wishbone_arbiter: Remove early_sel optimization when > 4 masters For the sake of overall timing in larger SoCs, remove the early_sel optimization when there are more than 4 masters. Also make the ack and stall signals to a particular master depend on that master's cyc, not on the busy signal, which can depend on any master's cyc. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	bf55efec6d	Arty A7: Add an option to select the number of CPU cores Timing is currently not very good with 2 cores on the Arty A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	9bd6b3d175	xics: Implement destination server field in interrupt source registers This implements the server field in the XISRs (external interrupt source registers), allowing each interrupt source to be directed to a particular CPU. If the CPU number that is written is out of range, CPU 0 is used. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	3924ed0f49	xics: Implement a presentation controller per CPU core This is mainly in order to get IPIs. All external interrupts still go to CPU 0 for now. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	49fcbaa5b2	soc: Implement a global timebase across all cores Now all cores see the same timebase value at any given instant. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	e0c5af9bb1	mw_debug: Add -c flag to select which CPU core to address Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago

1 2 3 4 5 ...

1427 Commits (3268ef717cfbc38290b0b49be22cb6679e378fb9) All Branches Search

1427 Commits (3268ef717cfbc38290b0b49be22cb6679e378fb9)

All Branches