microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	d531e8aa10	dcache: Improve timing Previously we only put slow requests in r1.req, but that caused timing problems because it meant the clock enable for all the registers in r1.req depended on whether we have a TLB and cache hit or not. Now we put any valid request (i.e. with req_go = 1) into r1.req, which has better timing because req_go is a relatively simple function of registered values (r0_full, r0_valid, r0.tlbie, r0.tlbld, r1.full, r1.ls_error, d_in.hold). We still have to work out if we have a slow request, but that is only needed for the D input of one register (r1.full). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	5121e0f392	core: Implement sync instructions This implements all the sync variants (sync, lwsync, ptesync, etc.) as a LSU op that gets sent down to the dcache and completes once the dcache state machine is idle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	00efcc2c3b	dcache: Make aligned quadword loads and stores actually be atomic This implements logic in the dcache to make aligned quadword loads and stores atomic with respect to other mechanisms that access memory. Such loads and stores are already marked with the atomic_qw bit in Loadstore1ToDcacheType. For quadword loads where the first dword access hits in the cache, we record the fact of the hit and the cache way used (r1.prev_hit and r1.prev_way). The second dword access then assumes a hit on the same way even if the cache line has been invalidated in the mean time by a snooped store. This gives the same effect as would loading both dwords at the time of the first dword load. For a lqarx, the reservation is set at the time of the first dword load, so if there is such a snooped store, the reservation will be invalid by the time the lqarx completes. If the first dword load hits on the cache line being refilled, so should the second, unless the refill finishes. In that case we set r1.prev_hit and r1.prev_way so the second load can use the line just refilled (but only if the first dword hit the line being refilled). For stores, the req.atomic_more flag is set on the first dword store, and that causes the STORE_WAIT_ACK state to wait for the next request without dropping cyc, so it is not possible for another wishbone master to insert an access between the writes of the two dwords to memory. For store-conditionals, DO_STCX state now transitions to STORE_WAIT_ACK state once the store has been accepted (stall is false). This means that the second store for a stqcx can be handled in the same way as the second store for a stq. Once the first store for a stqcx has succeeded, the second store is done unconditionally. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	c2dcf4b334	dcache: Generate a DSI on larx/stcx to non-cacheable memory Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	0fbeaa2a01	dcache: Use discrete req_op_* signals instead of an encoded req_op Hopefully this will improve timing by reducing unnecessary dependencies and giving more opportunities for routing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	ba4614c5f4	dcache: Implement data cache touch and flush instructions This implements dcbf, dcbt and dcbtst in the dcache. The dcbst (data cache block store) instruction remains a no-op because our dcache is write-through and therefore never has modified data that could need to be written back. Dcbt (data cache block touch) and dcbtst (data cache block touch for store) behave similarly except that dcbtst is a no-op on a readonly page. Neither instruction ever causes an interrupt. If they miss in the cache and the page is cacheable, they are handled like a load miss except that they complete immediately the state machine starts handling the load miss rather than waiting for any data. Dcbf (data cache block flush) can cause a data storage interrupt. If it hits in the cache, the state machine goes to a new FLUSH_CYCLE state in which the cache line valid bit is cleared. In order to avoid having more than 8 values in op_t, this combines OP_STORE_MISS and OP_STORE_HIT into a single state. A new OP_NOP state is used for operations which can complete immediately without changing any dcache state (now used for dcbt/dcbtst causing access exception or on a non-cachable page, or dcbf that misses the cache). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	b181d28df2	dcache: Cancel reservation on snooped store This restructures the reservation machinery so that the reservation is cleared when a snooped store by another agent is done to reservation address. The reservation address is now a real address rather than an effective address. For store-conditional, it is possible that a snooped store to the reservation address could come in even after we have asserted cyc and stb on the wishbone to do the store, and that should cause the store not to be performed. To achieve this, store-conditional now uses a separate state in the r1 state machine, which is set up so that losing the reservation due to a snooped store cause cyc and stb to be dropped immediately, and the store-conditional fails. For load-reserve, the reservation address is set at the end of cycle 1 and the reservation is made valid when the data is available. For lqarx, the reservation is made valid when the first doubleword of data is available. For the case where a snooped write comes in on cycle 0 of a larx and hits the same cache line, we detect that the index and way of the snooped write are the same as the index and way of the larx; it is done this way because reservation.addr is not set until the real address is available at the end of cycle 1. A hit on the same index and way causes reservation.valid to be set to 0 at the end of cycle 1. For a write in cycle 1, we compare the latched address in cycle 2 with the reservation address and clear reservation.valid at the end of cycle 2 if they match. In other words we compare the reservation address with both the address being written this cycle and the address being written in the previous cycle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	140b930ad3	tests: Add tests for lq/stq, plq/pstq and lqarx/stqcx. Lq and stq are tested in both BE and LE modes (though only 64-bit mode) by the 'modes' test. Lqarx and stqcx. are tested by the 'reservation' test in LE mode (64-bit). Plq and pstq are tested in 64-bit LE mode by the 'prefix' test. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	722f239c02	Reimplement quadword loads and stores This adds implementations of lq, plq, stq, pstq, lqarx and stqcx. Because register file addresses are now computed in decode1 before we have the decode table entry for the instruction, we have to check the icode directly to know when to read register RS\|1 before RS (i.e. for stq and stqcx in LE mode, but not pstq). For the second instance of the instruction, loadstore1 uses the EA from the first instance + 8. It generates an alignment interrupt for unaligned lqarx and stqcx and for lq in LE mode with an unaligned address. (The reason for the latter case is that it writes RT\|1 before RT, and if we have RA = RT\|1 and the second instance traps, we will have overwritten RA.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d358981d43	Generate doubled instructions in decode1 rather than decode2 This will allow us to read different source registers for the two pieces, which will be needed for instructions like stq. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	fa9df33f7e	Implement cfuged, pdepd and pextd This implements the cfuged, pdepd and pextd instructions in a new unit called bit_sorter (so called because cfuged and pextd can be viewed as sorting the bits of the mask). The cnt* instructions and the popcnt* instructions now use the same OP_COUNTB insn_type so as to free up an insn_type value to use for the new instructions. The new instructions are implemented using a slow and simple algorithm that takes 64 cycles to compute the result. The ex1 stage is stalled while this happens, as for a 64-bit multiply, or for a divide when there is no FPU. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d7d7a3afd4	Implement VRSAVE SPR VRSAVE is a 32-bit software-use SPR accessible in user mode. It is stored in the SPR RAM. The value read from the RAM is trimmed to 32 bits at the ramspr_read process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d112a7ad94	Implement scv and rfscv The main quirk here is that scv sets LR and CTR instead of SRR0 and SRR1, and likewise rfscv uses LR and CTR. Also, scv uses a set of 128 interrupt vectors starting at 0x17000. Fortunately, the layout of the SPR RAM was already such that LR and CTR were in the even and odd halves respectively at the same index, so reading or writing LR and CTR instead of SRR0 and SRR1 is quite easy. Use of scv is subject to an FSCR bit but not an HFSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	a88fa9c459	Implement DSCR The DSCR (Data Stream Control Register) is a user-accessible SPR that controls aspects of data prefetching. It has 25 bits of state defined in the ISA. This implements the register as a 25 read/write bits that do nothing, since we don't have any prefetching. The DSCR is accessible at two SPR numbers, 3 (unprivileged) and 17 (privileged). Access via these SPR numbers is controlled by an FSCR bit and an HFSCR bit. The FSCR bit controls access via SPR 3 in user mode. The HFSCR bit controls access via SPR 3 in user mode and either SPR number in privileged non-hypervisor mode, but since we don't implement privileged non-hypervisor mode, it does essentially the same thing as the FSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	205c0e2c78	Implement the wait instruction This implements the behaviour of the 'wait 0' instruction of pausing execution of instructions until an exception arises. The exceptions that terminate a wait are a pending trace exception, external interrupt request, PMU interrupt request, or decrementer negative exception. These exception conditions terminate a wait even if not enabled to generate an interrupt (e.g. if MSR[EE] is zero). This is implemented by having execute1 assert its busy_out signal while the wait state exists. The wait state is set by the completion of the wait instruction and cleared by a pending exception. If the WC operand of the wait instruction is non-zero, indicating wait for reservation loss or wait for a short period, then the wait instruction does not wait, but just acts as a no-op. In order to make space in the insn_type_t type without going over 64 elements, this combines OP_DCBT and OP_ICBT into a single OP_XCBT, since they were both no-ops (except for their influence on how SRR1 is set on a trace interrupt, where they were identical). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	7bc7f335f1	Implement CTRL register The CTRL register has a single bit called RUN. It has some unusual behaviours: - It can only be written via SPR number 152, which is privileged - It can only be read via SPR number 136, which is non-privileged - Reading in problem state (user mode) returns the RUN bit in bit 0, but reading in privileged state (hypervisor mode) returns the RUN bit in bits 0 and 15. - Reading SPR 152 in problem state causes a HEAI (illegal instruction) interrupt, but reading in privileged state is a no-op; this is the same as for an unimplemented SPR. The RUN bit goes to the PMU and is also plumbed out to drive a LED on the Arty board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	ff0744b795	execute1: Make CFAR able to be written using mtspr and read using DMI debug mtspr to CFAR is currently a no-op, which is not what should happen. Make it set the contents of CFAR. Also provide access to CFAR via the DMI debug interface as register 0x31. Fixes: `c2da82764f` ("core: Implement CFAR register", 2020-06-15) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d2777dd1dd	Generate Hypervisor Emulation Assistance Interrupt for illegal instructions This implements the HEIR register (Hypervisor Emulation Instruction Register) and arranges for an illegal instruction to cause a Hypervisor Emulation Assistance Interrupt (HEAI) at vector 0xE40, and set HEIR to the illegal instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	e3f4ccedec	Implement facility unavailable and hypervisor facility unavailable interrupts This adds the FSCR and HFSCR registers and implements the associated behaviours of taking a facility unavailable or hypervisor facility unavailable interrupt if certain actions are attempted while the relevant [H]FSCR bit is zero. At present, two FSCR enable bits and three HFSCR enable bits are implemented. FSCR has bits for prefixed instructions and accesses to the TAR register, and HFSCR has those plus a bit that enables access to floating-point registers and instructions. FSCR and HFSCR can be accessed through the debug interface using register addresses 0x2e and 0x2f. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	12a3d76217	Implement hrfid and make MSR[HV] always 1 Implementations without hypervisor/LPAR support are permitted by the architecture, but should have MSR[HV] forced to be 1 at all times, not 0, and should implement various instructions and registers that are only accessible in hypervisor mode. This commit implements MSR[HV] as a constant 1 bit and adds the hrfid instruction, which behaves exactly the same as rfid except that it reads HSRR0/1 instead of SRR0/1. We already have HSRR0/1 and HSPRG0/1 implemented. When HV=1, Linux expects external interrupts to arrive as hypervisor interrupts, so this adds support for hypervisor interrupts (i.e., those that set HSRR0/1) and makes the external interrupt be a hypervisor interrupt. (If we had an LPCR register, the LPES bit would control this, but we don't.) The xics test is updated to read HSRR0/1 after an external interrupt. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	6ef9395f10	Remove vestiges of the short (16-bit) multiplier option (#432 ) These aren't needed, and should have been removed in `d1e8e62fee` ("Remove option for "short" 16x16 bit multiplier", 2022-07-19), but were missed. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	463ac2e32e	ci: Use newer version of actions/upload-artifact (#433 ) v2 is now deprecated and causes the test run to fail. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	41da88e6d1	Merge pull request #428 from paulusmack/ecpix-5 ECPIX-5 support	1 year ago
Paul Mackerras	8be7c53ea0	arty a7: Fix build error with Vivado (#429 ) Commit `0ceace927c` ("Xilinx FPGAs: Eliminate Vivado critical warnings", 2024-03-08) incorrectly removed the constraints for shield_io36 through to shield_io44 (due to me applying the wrong version of a patch), resulting in Vivado giving compile errors when building for the Arty A7. This restores the constraints. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Michael Neuling	4b1e7c8d75	Merge pull request #427 from paulusmack/fixes Various FPU and warning fixes	1 year ago
Paul Mackerras	84ae593a09	ECPIX-5: Add liteeth support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	965b1cbcfe	liteeth: Regenerate from current upstream litex Some signals have changed names: "eth_" has been dropped from the names of the MII/GMII/RGMII signals. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	0ceace927c	Xilinx FPGAs: Eliminate Vivado critical warnings This resolves various warnings and critical warnings from Vivado. In particular, the asynchronous loops in the xilinx hardware RNG were giving a lot of critical warnings, which proved to be difficult to suppress, so this instead makes all the xilinx platforms use the 'nonrandom.vhdl' implementation, which always returns an error. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	0605039974	fetch1: Fix compiler warning with newer ghdl This fixes the following warning: fetch1.vhdl:293:18⚠️ declaration of "eaa_priv" hides signal "eaa_priv" [-Whide] variable eaa_priv : std_ulogic; ^ In fact the signal "eaa_priv" is unused, so remove it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	7f781b835d	tests/fpu: Add tests for ftdiv and ftsqrt Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	f4d28d1521	FPU: Fix ftdiv and ftsqrt instructions With ftdiv, we weren't setting result_exp to B.exponent before testing result_exp in state FTDIV_1; the fix is to transfer B.exponent to result_exp in state DO_FTDIV. With ftsqrt, we were setting bit 1 of the destination CR field to 0 always, due to a typo. Also move a couple of statements around to try to get slightly simpler logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	95595af08d	FPU: Fix typo in expression for exp_huge Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	4199f896a1	ECPIX-5: Add litesdcard support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	c1f23e7417	litesdcard: Regenerate verilog code with buffer direction controls This regenerates the verilog code from upstream litex plus a patch to generate outputs from the litesdcard module for controlling bidirectional buffers between the FPGA and SD card. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	264e609fd4	litesdcard: Name targets by vendor.frequency, not just vendor In future we will want to support targets using the same vendor but running at different clock frequencies. Since the clock frequency is a parameter to the gateware generation process, we now name the target directories as "vendor.frequency", i.e., "xilinx.100e6" and "lattice.48e6" rather than "xilinx" and "lattice". Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	e5d64f075d	ECPIX5: Enable FPU and BTC Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	2e8dc3f449	ECPIX-5: Add litedram support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	8e9ec4d1b7	ECPIX-5: Add pin definitions for the PMOD ports Not wired to anything at this point. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	82dacf2c1c	ECPIX-5: Wire up SPI flash The flash chip on my board is an ISSI IS26LP256P chip. The ISSI chip requires slightly different setup for quad mode from the other brands, but works fine with the existing SPI flash interface logic here. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	166e3f4ab2	ECPIX-5: Add basic support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	14359affbb	litedram: Regenerate gateware and software from recent upstream litedram Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	18911455c6	FPU: Fix fsel instruction to not alter FPSCR (#426 ) The fsel instruction is not supposed to alter FPSCR, but it was clearing FR and FI. Fix this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	af2d6e268e	Merge pull request #425 from paulusmack/fixes Fixes to the FPU and the run_test script	1 year ago
Paul Mackerras	7b86bf8863	tests/fpu: Add tests for fdiv and fre with denormalized operands Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	51954671f3	FPU: Fix behaviour of fdiv with denormalized divisor Renormalization of the divisor for fdiv[s] was adjusting the result exponent in the wrong direction, making the result smaller in magnitude than it should be by a power of 2. Fix this by negating r.shift in the RENORM_B2 state and then subtracting it in the LOOKUP cycle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	59a7996f1c	tests/fpu: Add checks for correct setting of FPRF Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	eecf1ca399	FPU: Fix setting of FPRF The sign recorded in FPRF was sometimes wrong because we weren't doing the modifications that were done in pack_dp when setting FPRF (FPSCR field). These modifications are: set sign for zero result of subtraction based on rounding mode; negate result for fnmadd/sub; but don't modify sign of NaNs. Instead we now do these modifications in the main state machine code and put the result in an 'rsign' variable that is used to set v.res_sign, then r.res_sign is used in the next cycle both for setting FPRF and in the pack_dp functions. That simplifies pack_dp and lets us get rid of r.res_negate, r.res_subtract and r.res_rmode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	9a4f0c18e1	scripts/run_test: Use grep -E instead of egrep Grep in Fedora 39 has started warning when invoked as 'egrep', so use grep -E instead to avoid the warnings. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	fdcb6ec449	Merge pull request #422 from paulusmack/real-icache Icache improvements - use synchronous RAMs and remove 4kB per set limit	2 years ago
Michael Neuling	63d4553fae	Merge pull request #423 from nickg/fix-vunit Fix compatibility with VUnit 5	2 years ago

1 2 3 4 5 ...

1362 Commits (d531e8aa1077f81f0b48d8112b3c5d9af684d453) All Branches Search

1362 Commits (d531e8aa1077f81f0b48d8112b3c5d9af684d453)

All Branches