microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	d540171f60	FPU: Ignore Rc bit for mffs* variants other than plain mffs Bit 0 of the instruction is Rc for mffs but reserved for the other mffs* instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	0e11f80f2f	FPU: Set FPSCR[FPRF] to zero for convert to integer operations This seems to be what P9 does. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	2f29daab2d	FPU: Fix setting of r.x for single-precision operations The fp_rounding function expects r.x to have been set based on the lower 31 bits of r.r, not 29 as presently done, so change 28 to SP_RBIT-1 (SP_RBIT is 31). Also add a test to check. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	577bbb8f5d	tests/fpu: Add test case for denorm input in frsp test Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	ab3783b61b	FPU: Fix setting of r.x Having computed rormr, use it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	7b1febcbd3	tests/fpu: Check setting of FR and FI in FPSCR by frsp instruction Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	e60840eabc	FPU: Make sure FR and FI in FPSCR get reset on special-case arith instructions Arithmetic instructions where the result is determined without doing any actual computation (i.e. the input(s) are NaNs, infinities, zeroes etc.) weren't resetting FR and FI properly. This combines the two blocks that handle the r.cycle_1_ar = 1 case to fix it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	0b3df8ab00	bitsort: Fix bperm instruction (#456 ) The result byte needs to be zero when the index byte value is >= 64. Fixes: `23ff954059` ("core: Change bperm to a simpler and slower implementation", 2025-01-07) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 weeks ago
Paul Mackerras	da695e7927	execute1: Fix bug where LPCR[HEIC] disabled interrupts in problem state (#453 ) LPCR[HEIC] should only disable external interrupts in hypervisor mode, and not in problem state (user mode). This fixes the expression for irq_valid to do that. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	fabe9a4feb	Merge pull request #452 from paulusmack/master SPR and interrupt bug fixes, implement LPCR[EVIRT], plus logic/timing improvements	3 months ago
Paul Mackerras	79e69d2a23	execute2: Simplify execute2 logic to improve timing This aims to simplify the logic in the execute2_1 process. It is not really necessary to preserve the contents of ex2 when stalled, except for ex2.e.last_nia; but when stalled, bits which would initiate downstream actions, such as ex2.e.valid, ex2.e.interrupt and ex2.se, should be cleared. Also, the path through stage2_stall to the bypass valid signal has shown up as a critical path. This dependency is there because the mfspr instruction to a slow SPR or a PMU SPR should not forward a result before the instruction is about to complete, because the result might change (for example when reading the timebase). To avoid this dependency, we simply don't forward results for mfspr to slow/PMU SPRs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	9326fc7f18	tests/modes: Test that mfspr/mtspr to unimplemented SPR in user mode causes HEAI Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	0255283159	tests/spr_read: Test that mfspr/mtspr to SPRs 0,4,5,6 generate HEAI Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	5548a5ba26	execute1: Make mfspr/mtspr to SPRs 0,4,5,6 generate HEAI The ISA specifies that mfspr or mtspr to SPR 0, 4, 5 or 6 should generate a hypervisor emulation assistance interrupt in privileged mode, so this adds logic to do that. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	9c40ddffd2	execute1: Implement LPCR[EVIRT] bit This implements the EVIRT bit in the LPCR register. When set to 1, EVIRT causes mfspr and mtspr to an undefined SPR number in privileged mode (i.e. hypervisor mode) to cause a hypervisor emulation assistance interrupt. When set to 0, such instructions are executed as no-ops. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	1d758f1d74	execute1: Simplify no-op behaviour of mfspr When mfspr is performed to one of the reserved no-op SPRs, or to an undefined SPR in privileged state, the behaviour is a no-op, that is, the destination register is not written. Previously this was done by writing back the same value that the register had before the instruction, but in fact it can be done simply by negating the write enable signal so that the result GPR is not written. This gives a small reduction in logic complexity. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	788f7a1755	core: Improve timing on bypass control paths In order to improve timing, the bypass paths now carry the register number being written as well as the tag. The decisions about which bypasses to use for which operands are then made by comparing the register numbers rather than by determining a tag from the register number and then comparing tags. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	f2166d326c	tests/fpu: Add a test for result writing being suppressed When an arithmetic instruction generates an invalid operation exception or a divide by zero exception, and that exception is enabled in the FPSCR, the writing of the result to the destination register should be suppressed, leaving whatever value was last written in the destination. Add a check that this occurs correctly, for the cases of square root of a negative number (invalid operation exception) and division by zero (zero divide exception). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	34cf092bf6	control: Fix forwarding when previous result write is suppressed If we have two successive instructions that write the same result register and then a third that uses the same register as an input, and the second instruction suppresses the write of its result, we can currently end up with the third instruction using the wrong value, because it uses the register value from before the first instruction rather than the result of the first instruction. (An example of an instruction suppressing the write of its result is a floating-point instruction that generates an enabled invalid operation exception but not an interrupt.) To fix this, the control module now uses any forwarded value for the register we want, not just the most recent value, but still stalls until it has the most recent value, or the previous instruction completes. Thus in the case described above, decode2 will have latched the value from the first instruction and so the third instruction gets the correct value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	9f9f9046ee	tests/spr_read: Add a check for no-op behaviour of mtspr and mfspr Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	4073aa5ffd	execute1: Fix setting HEIR and FSCR[IC] on interrupts Code in the execute1_actions process that handles illegal and facility unavailable interrupts was setting actions.se.set_heir or actions.se.set_ic, but then because actions.exception was also set, the contents of actions.se were ignored, meaning that HEIR or FSCR[IC] were not getting updated. To fix this, execute1_1 now conditions use of those fields on valid_in rather than go. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	6fe0b6e444	execute1: Fix no-op behaviour of reading undefined SPRs In privileged mode, mfspr from an undefined or unimplemented SPR number should be a no-op, which is implemented here by writing back the same value that the destination register previously had. However, we ended up writing back 0 because ex1.res2_sel was not set correctly. To fix this, set res2_sel to 10 in the undefined SPR case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	7619df6b78	core: Implement HRMOR as a read-only zero register (#450 ) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Boris Shingarov	198ad6d199	genesys2: Fix SPI_FLASH_OFFSET (#449 ) Signed-off-by: Boris Shingarov <shingarov@labware.com>	5 months ago
Boris Shingarov	a6858f716a	genesys2: Fix DDR3 PHY cmd_latency (#448 ) In the fall of 2020, cmd/clk scan in liblitedram was changed in a way that required reverting cmd_latency being set to 1 in LiteDRAM commit 4e62d28 back to 0. For the default in s7ddrphy.py this revert happened in 496cd27, but for standalone gen the .yml was never updated in neither LiteDRAM nor Microwatt, leading to regression: https://github.com/antonblanchard/microwatt/issues/363 The present commit updates the .yml so DRAM works on Genesys2 again. See also https://github.com/enjoy-digital/litedram/pull/368 for a corresponding update to the .yml in LiteDRAM. Signed-off-by: Boris Shingarov <shingarov@labware.com>	5 months ago
Paul Mackerras	152eef1156	Merge pull request #446 from paulusmack/master Implement hypervisor doorbells	5 months ago
Paul Mackerras	d2bf3f3580	core: Implement hypervisor doorbell interrupt and msg* instructions This implements the hypervisor doorbell exception and interrupt and the msgsnd, msgclr and msgsync instructions (msgsync is a no-op). The msgsnd instruction can generate a hypervisor doorbell interrupt on any CPU in the system. To achieve this, each core sends its hypervisor doorbell messages to the soc level, which ORs together the bits for each CPU and sends it to that CPU. The privileged doorbell exception/interrupt and the msgsndp/msgclrp instructions are not required since we don't implement SMT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	ca872faede	core: Consolidate several OP_* values into a single OP_COMPUTE This replaces OP_ADDG6S, OP_BCD, OP_BREV, OP_CMPB, OP_CMPEQB, OP_CMPRB, OP_CROP, OP_EXTS, OP_EXTSWSLI, OP_ISEL, OP_LOGIC, OP_MFCR, OP_PRTY, OP_RLC, OP_RLCL, OP_RLCR, OP_SETB, OP_SHL, OP_SHR, and OP_XOR with a single OP_COMPUTE. The replaced operations are all ones which just compute a result value (for GPR or CR) in execute1, don't have any other side effects, and aren't used in decode2 to determine other signals. The operation to be performed is sufficiently defined by the result and subresult fields in the decode table. With the elimination of OP_SPARE, this reduces the number of insn_type_t values to 44. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	a764fd464e	Merge pull request #445 from paulusmack/master Various improvements, including SMP support for the Acorn-CLE-215 board.	5 months ago
Paul Mackerras	8f6c727309	execute1: Rework data paths for mfspr and mtspr Data being written to an SPR by mtspr now comes in to execute2 via ex1.write_spr_data (renamed from ex1.ramspr_odd_data) rather than ex1.e.write_data. This eliminates the need for the main result mux in execute1 to be able to pass the c_in value through. For mfspr, the no-op behaviour is obtained by selecting ex1.write_spr_data as spr_result in execute2. We already had ex1.write_spr_data being set from c_in, so no new logic is required there. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	fc3ff2d340	logical: Use sub_select rather than insn_type to select logical op Also select the RS passthrough in the logical unit by default for mfspr, which is needed for the no-op SPRs and the no-op behaviour of privileged mfspr to unimplemented SPRs. For slow SPRs the RS behaviour gets passed through from execute1 to execute2 and replaced by the correct result in execute2's result mux. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	54173a0677	decode: Move result_sel and subresult_sel into main decode table Instead of working out result_sel and subresult_sel in decode2 from the insn_type, they now come directly from the main decode table in decode1. This reduces the need for distinct insn_type values and should enable us to avoid expanding insn_type beyond 6 bits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	8bfce4890b	predecode: Add some more comments No code change. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	0f8c4afc52	openocd: Update arty config for newer openocd versions Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	0bf1dcedbd	acorn-cle-215: Implement SMP and enable FPU and BTC The four LEDs on the Acorn-CLE-215 (Nitefury) board become run lights for the first four CPUs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	ce5a967ac2	soc: Allow for up to 1GB of DRAM in address decoding The Acorn-CLE-215 board has 1GB of DRAM. Without this, the top 512MB of DRAM is not accessible. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	4282d37741	FPU: Faster method for testing for 1-bits at right end of R At various points we need to set the X bit if any bit of R which would be shifted out by a right shift of N bits is a 1. We can do this by computing R \| -R, which contains a 1 in the position of the right-most 1-bit in R and in all positions to the left, and zeroes to the right. That means we can test for the least-significant N bits being non-zero by testing whether bit N-1 of (R \| -R) is a 1. Doing this uses fewer LUTs and has better timing than the old method of generating a mask, ANDing it with R, and testing whether the result is non-zero. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	04b0c901e0	dcache: Simplify expression for read enable of cache RAM The path from execute_to_loadstore.valid through to the read enable of the cache RAM has showed up as a critical path. In fact we can simplify this by always asserting read enable when not stalled. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	8605dcb4f1	decode2: Use register addresses from decode1 rather than recomputing them Currently, decode2 computes register addresses from the input_reg fields in the decode table entry and the instruction word. This duplicates a computation that decode1 has already done based on the insn_code value. Instead of doing this redundant computation, just use the register addresses supplied by decode1. This means that the decode_input_reg_* functions merely compute whether the register operand is used or not. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	e14712e45c	core: Simplify operand presentation for hash instructions This removes the cases in the decode stages which allowed the C register address to come from the RB field for the hash instructions (hashst[p], hashchk[p]), and generated a negative immediate value for the B operand. The motivation is to simpify the logic for the C register address. Instead the unusual construction of the address for the hash instructions is handled in the loadstore1_in process, and the hash computation uses the A and B operands rather than A and C. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	dc9d351833	Merge pull request #444 from paulusmack/master Miscellaneous improvements	9 months ago
Paul Mackerras	de2e8f81ee	decode: Execute cpabort as a no-op It seems that the Linux kernel executes cpabort on any CPU that implements ISA v3.1 or later, despite cpabort being optional. To cope with this, implement cpabort as a no-op. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	b65dde1a95	arty a7: Display run status of two CPUs on LEDs 6 and 7 The run status LED is off when the core is held in reset (e.g. when the second core hasn't been started yet). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	51dd7f578f	countbits: Move more popcount calculation before the clock edge Popcount takes two cycles to execute. The computation of the final popcount value in the second cycle has showed up as a critical path on the Artix-7, so move one stage of the summation back into the first cycle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	b14dd43ce6	Merge pull request #443 from paulusmack/compliance More architecture compliance improvements: LPCR, [U]SIER[23], [U]MMCR3, HMER, HMEER. Remove HFSCR and associated logic.	9 months ago
Paul Mackerras	361a01259c	Merge pull request #441 from paulusmack/dcache This reworks the dcache to try and simplify the logic and alleviate some of the paths that have been showing up as critical paths in synthesis. An example is a dependency of the req_is_hit signal on the wishbone ack, which this series removes. Overall this seems to have reduced LUT usage and improved timing.	9 months ago
Paul Mackerras	7e544c1fb8	Merge pull request #442 from paulusmack/fpu This reworks the FPU logic to try and get closer to the point where the big state machine could be converted into microcode. This means that as far as possible the state machine should just set control lines, ideally with as little conditional logic in each state as possible, and that anything that is considered data should be manipulated outside of the state machine. This also improves architecture compliance in the area of exception handling, and alleviates some critical paths.	9 months ago
Paul Mackerras	8f7326a824	core: Implement various SPRs which read zero and ignore writes This implements [U]SIER2, [U]SIER3, [U]MMCR3, HMER and HMEER as SPRs which return zero when read, and ignore writes. The zero value is provided via the slow SPR read multiplexer. To avoid increasing the size of the selector from 4 bits to 5, the (implementation specific) LOG_ADDR and LOG_DATA SPRs now share a single selector value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	1da8476cf9	dcache: Simplify forwarding of load data while reloading a cache line This removes a dependency of req_is_hit and similar signals on the wishbone ack input, by removing use_forward_rl, and making idx_reload not dependent on wr_row_match and wishbone_in.ack. Previously if a load in r0 hit the doubleword being supplied from memory, that was treated as a hit and the data was forwarded via a multiplexer associated with the cache RAM. Now it is called a miss and completed by the logic in the RELOAD_WAIT_ACK state of the state machine. The only downside is that now the selection of data source in the dcache_fast_hit process depends on req_is_hit rather than r1.full. Overall this change seems to reduce the number of LUTs, and make timing easier on the ECP-5. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	c938246cc8	dcache: Simplify addressing of the dcache TLB Instead of having TLB invalidation and TLB load requests come through the dcache main path, these operations are now done in one cycle entirely based on signals from the MMU, and don't involve the TLB read path or the dcache state machine at all. So that we know which way of the TLB to affect for invalidations, loadstore1 now sends down a "TLB probe" operation for tlbie instructions which goes through the dcache pipeline and sets the r1.tlb_hit_* fields which are used in the subsequent invalidation operation from the MMU (if it is a single-page invalidation). TLB load operations write to the way identified by r1.victim_way, which was set on the TLB miss that triggered the TLB reload. Since we are writing just one way of the TLB tags now, rather than writing all ways with one way's value changed, we now pad each way to a multiple of 8 bits so that byte write-enables can be used to select which way gets written. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago

1 2 3 4 5 ...

1485 Commits (d540171f60eff051a419cf290b90143e0c137bd0) All Branches Search

1485 Commits (d540171f60eff051a419cf290b90143e0c137bd0)

All Branches