microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	d33f31509b	FPU: Clear S in ADD_SHIFT state Otherwise, if this is a multiply-add instruction and the result needs to be shifted left, bits of the product in S will contaminate the final result. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	b8f7cbd894	FPU: Record bits shifted out of addend in fmadd-family instructions If the addend is smaller than the product and thus needs to be shifted right, record if any bits are lost from the right end in r.x, so that the result gets rounded correctly. Also add a test that checks one such case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	009ee1c9c5	FPU: Renormalize frsp operand if denormalized This arranges for the frsp operand to be renormalized if necessary. Without this, we can incorrectly get X set to 1 for denormalized operands, and hence the rounding may be done incorrectly. To make things clearer, we now have an explicit flag indicating when the B operand needs to be in normalized form. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	baf8f5f8c6	FPU: Force reserved FPSCR bit 11 to zero This ensures that the reserved FPSCR bit can never be set, by clearing it at the end of the fpu_1 process. Also remove a redundant setting of cr_result in the mcrfs code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	a18c462b27	FPU: Ignore stale P contents in short-circuit multiply-add When a multiply-add is done with A or C equal to zero, the actual multiplication operation is not done, hence P is not valid, so in FINISH state we shouldn't set X based on P being non-zero. Fix this by clearing the is_multiply flag in the short-circuit case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	41988e3b5f	FPU: Fix comparison of remainder in square root code The square root procedure needs to compare B - R^2 with 2R + 1 to decide whether to increment the square root estimate R by 1. It currently does this by putting 2R + 1 in B and using the pcmpb_lt and pcmpb_eq signals. This is not correct because the comparisons that generate those signals have a 2-bit shift embedded into them. Instead, put 2R + 1 into C and use pcmpc_lt/eq, which don't have the 2-bit shift. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	f3b9566ae2	FPU: Round to single precision for fcfid[u]s The fcfids and fcfidus instructions weren't rounding to single precision because r.longmask wasn't getting set. To fix this, set v.longmask to e_in.single for the fcfid* instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	e5651e2eab	FPU: Avoid adding bias twice in UE=1 underflow case In case of underflow with UE=1, ROUND_UFLOW state adds the exponent bias and then goes to NORMALIZE state if the value is not normalized. Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is still tiny, resulting in the bias getting added twice. To avoid this, if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW state which does the normalization and goes to ROUNDING state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	a0755935f4	FPU: Normalize B for fmadd family instructions If B is denormalized, but the A*C product is much smaller, then the result is B; in the UE=1 case we need to normalize the result, and the left shift to do that can bring in low-order product bits from S and corrupt the result. To avoid this, make sure B is normalized. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	32919435a3	FPU: Allow mtfsb* to set FPSCR[FX] implicitly If mtfsb1 causes an individual exception bit to go from 0 to 1, that should set FX as well. Arrange for this by setting update_fx to 1. Also make sure mcrfs doesn't copy the reserved FPSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	e471581222	FPU: Do result processing on denorm short-circuit results when FPSCR[UE] is set Results that are tiny (i.e., in the denorm range) need special processing when underflow exceptions are enabled, including in the cases where the result is just one of the input operands, such as for a fmadd with A or C equal to zero. To make sure this gets done, go to FINISH state rather than returning the relevant input operand as the result. The same logic is now used when the result needs to be rounded to single precision. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	0478fe41dd	FPU: Reset FPSCR[FR,FI] at beginning of fcfid* Otherwise a non-zero setting from a previous instruction won't get cleared. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	f252dba43d	FPU: Only apply zero subtraction result sign rule when result is exactly zero The rule in the ISA about the sign of the result of a subtraction when the magnitude of the result is zero only applies when the operands are equal in magnitude but opposite in sign, i.e. when the result is exactly zero. Add a check using FPSCR[FI] to exclude the cases where the exact result is non-zero but gets truncated to zero by rounding. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	8a204f1058	FPU: Set FPSCR exception summary based on individual invalid exception bits Rather than setting FPSCR[FX] to 1 when FPSCR[VX] transitions from 0 to 1, this sets it when any of the individual invalid exception bits (VSXNAN, VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) transitions from 0 to 1. This better matches the ISA and P9 behaviour. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	fb71f62b83	FPU: Round finite special-case results to single precision if required When a special case is detected, such as a zero operand to an add, and the operation is a single-precision operation such as fadds, we need to round the result to single precision instead of just returning the relevant input operand unmodified. This accomplishes that by going to DO_FRSP_2 state from the special-case code for single-precision operations that return a finite floating-point result. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	de71a6119c	FPU: Make FPSCR bit 11 always read as 0 Bit 11 (52 in BE numbering) is a reserved bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	ca792f3b13	FPU: Make convert-to-integer-word instructions behave like P9 The fctiw* instructions return a copy of the value in bits 31..0 in bits 63..32 of the result on P9, rather than a sign or zero extension of the word result. Make the FPU do the same. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	82825a11ba	FPU: Set result sign correctly for denorm +/- 0 case Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	37b1afc7f7	FPU: Make fri* instructions set FPSCR[FR,FI] to zero As required by the ISA. Also, never generate an inexact exception. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	dcd85164c6	FPU: Make fsel not alter FPSCR fsel is a move-type instruction, and hence shouldn't affect FPSCR. Set v.writing_fpr and v.instr_done, rather than setting arith_done, to achieve this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	066e38b8ea	FPU: Do proper over/underflow handling for single-precision [fm]add The ADD_3 state incorporated some of the logic of the FINISH state, but in some cases assumed the result couldn't overflow or underflow - which is not true for single precision operations, if the input operands are outside the single precision range. Fix this, and simplify things, by having ADD_3 always go to FINISH state, which does the full overflow and underflow checking. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	d540171f60	FPU: Ignore Rc bit for mffs* variants other than plain mffs Bit 0 of the instruction is Rc for mffs but reserved for the other mffs* instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	0e11f80f2f	FPU: Set FPSCR[FPRF] to zero for convert to integer operations This seems to be what P9 does. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	2f29daab2d	FPU: Fix setting of r.x for single-precision operations The fp_rounding function expects r.x to have been set based on the lower 31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1 (SP_RBIT is 31). Also add a test to check. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	577bbb8f5d	tests/fpu: Add test case for denorm input in frsp test Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	ab3783b61b	FPU: Fix setting of r.x Having computed rormr, use it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	7b1febcbd3	tests/fpu: Check setting of FR and FI in FPSCR by frsp instruction Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	e60840eabc	FPU: Make sure FR and FI in FPSCR get reset on special-case arith instructions Arithmetic instructions where the result is determined without doing any actual computation (i.e. the input(s) are NaNs, infinities, zeroes etc.) weren't resetting FR and FI properly. This combines the two blocks that handle the r.cycle_1_ar = 1 case to fix it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	0b3df8ab00	bitsort: Fix bperm instruction (#456 ) The result byte needs to be zero when the index byte value is >= 64. Fixes: `23ff954059` ("core: Change bperm to a simpler and slower implementation", 2025-01-07) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	da695e7927	execute1: Fix bug where LPCR[HEIC] disabled interrupts in problem state (#453 ) LPCR[HEIC] should only disable external interrupts in hypervisor mode, and not in problem state (user mode). This fixes the expression for irq_valid to do that. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	fabe9a4feb	Merge pull request #452 from paulusmack/master SPR and interrupt bug fixes, implement LPCR[EVIRT], plus logic/timing improvements	3 months ago
Paul Mackerras	79e69d2a23	execute2: Simplify execute2 logic to improve timing This aims to simplify the logic in the execute2_1 process. It is not really necessary to preserve the contents of ex2 when stalled, except for ex2.e.last_nia; but when stalled, bits which would initiate downstream actions, such as ex2.e.valid, ex2.e.interrupt and ex2.se, should be cleared. Also, the path through stage2_stall to the bypass valid signal has shown up as a critical path. This dependency is there because the mfspr instruction to a slow SPR or a PMU SPR should not forward a result before the instruction is about to complete, because the result might change (for example when reading the timebase). To avoid this dependency, we simply don't forward results for mfspr to slow/PMU SPRs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	9326fc7f18	tests/modes: Test that mfspr/mtspr to unimplemented SPR in user mode causes HEAI Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	0255283159	tests/spr_read: Test that mfspr/mtspr to SPRs 0,4,5,6 generate HEAI Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	5548a5ba26	execute1: Make mfspr/mtspr to SPRs 0,4,5,6 generate HEAI The ISA specifies that mfspr or mtspr to SPR 0, 4, 5 or 6 should generate a hypervisor emulation assistance interrupt in privileged mode, so this adds logic to do that. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	9c40ddffd2	execute1: Implement LPCR[EVIRT] bit This implements the EVIRT bit in the LPCR register. When set to 1, EVIRT causes mfspr and mtspr to an undefined SPR number in privileged mode (i.e. hypervisor mode) to cause a hypervisor emulation assistance interrupt. When set to 0, such instructions are executed as no-ops. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	1d758f1d74	execute1: Simplify no-op behaviour of mfspr When mfspr is performed to one of the reserved no-op SPRs, or to an undefined SPR in privileged state, the behaviour is a no-op, that is, the destination register is not written. Previously this was done by writing back the same value that the register had before the instruction, but in fact it can be done simply by negating the write enable signal so that the result GPR is not written. This gives a small reduction in logic complexity. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	788f7a1755	core: Improve timing on bypass control paths In order to improve timing, the bypass paths now carry the register number being written as well as the tag. The decisions about which bypasses to use for which operands are then made by comparing the register numbers rather than by determining a tag from the register number and then comparing tags. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	f2166d326c	tests/fpu: Add a test for result writing being suppressed When an arithmetic instruction generates an invalid operation exception or a divide by zero exception, and that exception is enabled in the FPSCR, the writing of the result to the destination register should be suppressed, leaving whatever value was last written in the destination. Add a check that this occurs correctly, for the cases of square root of a negative number (invalid operation exception) and division by zero (zero divide exception). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	34cf092bf6	control: Fix forwarding when previous result write is suppressed If we have two successive instructions that write the same result register and then a third that uses the same register as an input, and the second instruction suppresses the write of its result, we can currently end up with the third instruction using the wrong value, because it uses the register value from before the first instruction rather than the result of the first instruction. (An example of an instruction suppressing the write of its result is a floating-point instruction that generates an enabled invalid operation exception but not an interrupt.) To fix this, the control module now uses any forwarded value for the register we want, not just the most recent value, but still stalls until it has the most recent value, or the previous instruction completes. Thus in the case described above, decode2 will have latched the value from the first instruction and so the third instruction gets the correct value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	9f9f9046ee	tests/spr_read: Add a check for no-op behaviour of mtspr and mfspr Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	4073aa5ffd	execute1: Fix setting HEIR and FSCR[IC] on interrupts Code in the execute1_actions process that handles illegal and facility unavailable interrupts was setting actions.se.set_heir or actions.se.set_ic, but then because actions.exception was also set, the contents of actions.se were ignored, meaning that HEIR or FSCR[IC] were not getting updated. To fix this, execute1_1 now conditions use of those fields on valid_in rather than go. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	6fe0b6e444	execute1: Fix no-op behaviour of reading undefined SPRs In privileged mode, mfspr from an undefined or unimplemented SPR number should be a no-op, which is implemented here by writing back the same value that the destination register previously had. However, we ended up writing back 0 because ex1.res2_sel was not set correctly. To fix this, set res2_sel to 10 in the undefined SPR case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	7619df6b78	core: Implement HRMOR as a read-only zero register (#450 ) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Boris Shingarov	198ad6d199	genesys2: Fix SPI_FLASH_OFFSET (#449 ) Signed-off-by: Boris Shingarov <shingarov@labware.com>	5 months ago
Boris Shingarov	a6858f716a	genesys2: Fix DDR3 PHY cmd_latency (#448 ) In the fall of 2020, cmd/clk scan in liblitedram was changed in a way that required reverting cmd_latency being set to 1 in LiteDRAM commit 4e62d28 back to 0. For the default in s7ddrphy.py this revert happened in 496cd27, but for standalone gen the .yml was never updated in neither LiteDRAM nor Microwatt, leading to regression: https://github.com/antonblanchard/microwatt/issues/363 The present commit updates the .yml so DRAM works on Genesys2 again. See also https://github.com/enjoy-digital/litedram/pull/368 for a corresponding update to the .yml in LiteDRAM. Signed-off-by: Boris Shingarov <shingarov@labware.com>	5 months ago
Paul Mackerras	152eef1156	Merge pull request #446 from paulusmack/master Implement hypervisor doorbells	5 months ago
Paul Mackerras	d2bf3f3580	core: Implement hypervisor doorbell interrupt and msg* instructions This implements the hypervisor doorbell exception and interrupt and the msgsnd, msgclr and msgsync instructions (msgsync is a no-op). The msgsnd instruction can generate a hypervisor doorbell interrupt on any CPU in the system. To achieve this, each core sends its hypervisor doorbell messages to the soc level, which ORs together the bits for each CPU and sends it to that CPU. The privileged doorbell exception/interrupt and the msgsndp/msgclrp instructions are not required since we don't implement SMT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	ca872faede	core: Consolidate several OP_* values into a single OP_COMPUTE This replaces OP_ADDG6S, OP_BCD, OP_BREV, OP_CMPB, OP_CMPEQB, OP_CMPRB, OP_CROP, OP_EXTS, OP_EXTSWSLI, OP_ISEL, OP_LOGIC, OP_MFCR, OP_PRTY, OP_RLC, OP_RLCL, OP_RLCR, OP_SETB, OP_SHL, OP_SHR, and OP_XOR with a single OP_COMPUTE. The replaced operations are all ones which just compute a result value (for GPR or CR) in execute1, don't have any other side effects, and aren't used in decode2 to determine other signals. The operation to be performed is sufficiently defined by the result and subresult fields in the decode table. With the elimination of OP_SPARE, this reduces the number of insn_type_t values to 44. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	a764fd464e	Merge pull request #445 from paulusmack/master Various improvements, including SMP support for the Acorn-CLE-215 board.	5 months ago

1 2 3 4 5 ...

1506 Commits (d33f31509b1f3259e0a350bf174192191c03b99e) All Branches Search

1506 Commits (d33f31509b1f3259e0a350bf174192191c03b99e)

All Branches