microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	ba4614c5f4	dcache: Implement data cache touch and flush instructions This implements dcbf, dcbt and dcbtst in the dcache. The dcbst (data cache block store) instruction remains a no-op because our dcache is write-through and therefore never has modified data that could need to be written back. Dcbt (data cache block touch) and dcbtst (data cache block touch for store) behave similarly except that dcbtst is a no-op on a readonly page. Neither instruction ever causes an interrupt. If they miss in the cache and the page is cacheable, they are handled like a load miss except that they complete immediately the state machine starts handling the load miss rather than waiting for any data. Dcbf (data cache block flush) can cause a data storage interrupt. If it hits in the cache, the state machine goes to a new FLUSH_CYCLE state in which the cache line valid bit is cleared. In order to avoid having more than 8 values in op_t, this combines OP_STORE_MISS and OP_STORE_HIT into a single state. A new OP_NOP state is used for operations which can complete immediately without changing any dcache state (now used for dcbt/dcbtst causing access exception or on a non-cachable page, or dcbf that misses the cache). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	b181d28df2	dcache: Cancel reservation on snooped store This restructures the reservation machinery so that the reservation is cleared when a snooped store by another agent is done to reservation address. The reservation address is now a real address rather than an effective address. For store-conditional, it is possible that a snooped store to the reservation address could come in even after we have asserted cyc and stb on the wishbone to do the store, and that should cause the store not to be performed. To achieve this, store-conditional now uses a separate state in the r1 state machine, which is set up so that losing the reservation due to a snooped store cause cyc and stb to be dropped immediately, and the store-conditional fails. For load-reserve, the reservation address is set at the end of cycle 1 and the reservation is made valid when the data is available. For lqarx, the reservation is made valid when the first doubleword of data is available. For the case where a snooped write comes in on cycle 0 of a larx and hits the same cache line, we detect that the index and way of the snooped write are the same as the index and way of the larx; it is done this way because reservation.addr is not set until the real address is available at the end of cycle 1. A hit on the same index and way causes reservation.valid to be set to 0 at the end of cycle 1. For a write in cycle 1, we compare the latched address in cycle 2 with the reservation address and clear reservation.valid at the end of cycle 2 if they match. In other words we compare the reservation address with both the address being written this cycle and the address being written in the previous cycle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	140b930ad3	tests: Add tests for lq/stq, plq/pstq and lqarx/stqcx. Lq and stq are tested in both BE and LE modes (though only 64-bit mode) by the 'modes' test. Lqarx and stqcx. are tested by the 'reservation' test in LE mode (64-bit). Plq and pstq are tested in 64-bit LE mode by the 'prefix' test. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	722f239c02	Reimplement quadword loads and stores This adds implementations of lq, plq, stq, pstq, lqarx and stqcx. Because register file addresses are now computed in decode1 before we have the decode table entry for the instruction, we have to check the icode directly to know when to read register RS\|1 before RS (i.e. for stq and stqcx in LE mode, but not pstq). For the second instance of the instruction, loadstore1 uses the EA from the first instance + 8. It generates an alignment interrupt for unaligned lqarx and stqcx and for lq in LE mode with an unaligned address. (The reason for the latter case is that it writes RT\|1 before RT, and if we have RA = RT\|1 and the second instance traps, we will have overwritten RA.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d358981d43	Generate doubled instructions in decode1 rather than decode2 This will allow us to read different source registers for the two pieces, which will be needed for instructions like stq. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	fa9df33f7e	Implement cfuged, pdepd and pextd This implements the cfuged, pdepd and pextd instructions in a new unit called bit_sorter (so called because cfuged and pextd can be viewed as sorting the bits of the mask). The cnt* instructions and the popcnt* instructions now use the same OP_COUNTB insn_type so as to free up an insn_type value to use for the new instructions. The new instructions are implemented using a slow and simple algorithm that takes 64 cycles to compute the result. The ex1 stage is stalled while this happens, as for a 64-bit multiply, or for a divide when there is no FPU. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d7d7a3afd4	Implement VRSAVE SPR VRSAVE is a 32-bit software-use SPR accessible in user mode. It is stored in the SPR RAM. The value read from the RAM is trimmed to 32 bits at the ramspr_read process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d112a7ad94	Implement scv and rfscv The main quirk here is that scv sets LR and CTR instead of SRR0 and SRR1, and likewise rfscv uses LR and CTR. Also, scv uses a set of 128 interrupt vectors starting at 0x17000. Fortunately, the layout of the SPR RAM was already such that LR and CTR were in the even and odd halves respectively at the same index, so reading or writing LR and CTR instead of SRR0 and SRR1 is quite easy. Use of scv is subject to an FSCR bit but not an HFSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	a88fa9c459	Implement DSCR The DSCR (Data Stream Control Register) is a user-accessible SPR that controls aspects of data prefetching. It has 25 bits of state defined in the ISA. This implements the register as a 25 read/write bits that do nothing, since we don't have any prefetching. The DSCR is accessible at two SPR numbers, 3 (unprivileged) and 17 (privileged). Access via these SPR numbers is controlled by an FSCR bit and an HFSCR bit. The FSCR bit controls access via SPR 3 in user mode. The HFSCR bit controls access via SPR 3 in user mode and either SPR number in privileged non-hypervisor mode, but since we don't implement privileged non-hypervisor mode, it does essentially the same thing as the FSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	205c0e2c78	Implement the wait instruction This implements the behaviour of the 'wait 0' instruction of pausing execution of instructions until an exception arises. The exceptions that terminate a wait are a pending trace exception, external interrupt request, PMU interrupt request, or decrementer negative exception. These exception conditions terminate a wait even if not enabled to generate an interrupt (e.g. if MSR[EE] is zero). This is implemented by having execute1 assert its busy_out signal while the wait state exists. The wait state is set by the completion of the wait instruction and cleared by a pending exception. If the WC operand of the wait instruction is non-zero, indicating wait for reservation loss or wait for a short period, then the wait instruction does not wait, but just acts as a no-op. In order to make space in the insn_type_t type without going over 64 elements, this combines OP_DCBT and OP_ICBT into a single OP_XCBT, since they were both no-ops (except for their influence on how SRR1 is set on a trace interrupt, where they were identical). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	7bc7f335f1	Implement CTRL register The CTRL register has a single bit called RUN. It has some unusual behaviours: - It can only be written via SPR number 152, which is privileged - It can only be read via SPR number 136, which is non-privileged - Reading in problem state (user mode) returns the RUN bit in bit 0, but reading in privileged state (hypervisor mode) returns the RUN bit in bits 0 and 15. - Reading SPR 152 in problem state causes a HEAI (illegal instruction) interrupt, but reading in privileged state is a no-op; this is the same as for an unimplemented SPR. The RUN bit goes to the PMU and is also plumbed out to drive a LED on the Arty board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	ff0744b795	execute1: Make CFAR able to be written using mtspr and read using DMI debug mtspr to CFAR is currently a no-op, which is not what should happen. Make it set the contents of CFAR. Also provide access to CFAR via the DMI debug interface as register 0x31. Fixes: `c2da82764f` ("core: Implement CFAR register", 2020-06-15) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d2777dd1dd	Generate Hypervisor Emulation Assistance Interrupt for illegal instructions This implements the HEIR register (Hypervisor Emulation Instruction Register) and arranges for an illegal instruction to cause a Hypervisor Emulation Assistance Interrupt (HEAI) at vector 0xE40, and set HEIR to the illegal instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	e3f4ccedec	Implement facility unavailable and hypervisor facility unavailable interrupts This adds the FSCR and HFSCR registers and implements the associated behaviours of taking a facility unavailable or hypervisor facility unavailable interrupt if certain actions are attempted while the relevant [H]FSCR bit is zero. At present, two FSCR enable bits and three HFSCR enable bits are implemented. FSCR has bits for prefixed instructions and accesses to the TAR register, and HFSCR has those plus a bit that enables access to floating-point registers and instructions. FSCR and HFSCR can be accessed through the debug interface using register addresses 0x2e and 0x2f. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	12a3d76217	Implement hrfid and make MSR[HV] always 1 Implementations without hypervisor/LPAR support are permitted by the architecture, but should have MSR[HV] forced to be 1 at all times, not 0, and should implement various instructions and registers that are only accessible in hypervisor mode. This commit implements MSR[HV] as a constant 1 bit and adds the hrfid instruction, which behaves exactly the same as rfid except that it reads HSRR0/1 instead of SRR0/1. We already have HSRR0/1 and HSPRG0/1 implemented. When HV=1, Linux expects external interrupts to arrive as hypervisor interrupts, so this adds support for hypervisor interrupts (i.e., those that set HSRR0/1) and makes the external interrupt be a hypervisor interrupt. (If we had an LPCR register, the LPES bit would control this, but we don't.) The xics test is updated to read HSRR0/1 after an external interrupt. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	6ef9395f10	Remove vestiges of the short (16-bit) multiplier option (#432 ) These aren't needed, and should have been removed in `d1e8e62fee` ("Remove option for "short" 16x16 bit multiplier", 2022-07-19), but were missed. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	463ac2e32e	ci: Use newer version of actions/upload-artifact (#433 ) v2 is now deprecated and causes the test run to fail. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	41da88e6d1	Merge pull request #428 from paulusmack/ecpix-5 ECPIX-5 support	1 year ago
Paul Mackerras	8be7c53ea0	arty a7: Fix build error with Vivado (#429 ) Commit `0ceace927c` ("Xilinx FPGAs: Eliminate Vivado critical warnings", 2024-03-08) incorrectly removed the constraints for shield_io36 through to shield_io44 (due to me applying the wrong version of a patch), resulting in Vivado giving compile errors when building for the Arty A7. This restores the constraints. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Michael Neuling	4b1e7c8d75	Merge pull request #427 from paulusmack/fixes Various FPU and warning fixes	1 year ago
Paul Mackerras	84ae593a09	ECPIX-5: Add liteeth support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	965b1cbcfe	liteeth: Regenerate from current upstream litex Some signals have changed names: "eth_" has been dropped from the names of the MII/GMII/RGMII signals. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	0ceace927c	Xilinx FPGAs: Eliminate Vivado critical warnings This resolves various warnings and critical warnings from Vivado. In particular, the asynchronous loops in the xilinx hardware RNG were giving a lot of critical warnings, which proved to be difficult to suppress, so this instead makes all the xilinx platforms use the 'nonrandom.vhdl' implementation, which always returns an error. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	0605039974	fetch1: Fix compiler warning with newer ghdl This fixes the following warning: fetch1.vhdl:293:18⚠️ declaration of "eaa_priv" hides signal "eaa_priv" [-Whide] variable eaa_priv : std_ulogic; ^ In fact the signal "eaa_priv" is unused, so remove it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	7f781b835d	tests/fpu: Add tests for ftdiv and ftsqrt Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	f4d28d1521	FPU: Fix ftdiv and ftsqrt instructions With ftdiv, we weren't setting result_exp to B.exponent before testing result_exp in state FTDIV_1; the fix is to transfer B.exponent to result_exp in state DO_FTDIV. With ftsqrt, we were setting bit 1 of the destination CR field to 0 always, due to a typo. Also move a couple of statements around to try to get slightly simpler logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	95595af08d	FPU: Fix typo in expression for exp_huge Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	4199f896a1	ECPIX-5: Add litesdcard support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	c1f23e7417	litesdcard: Regenerate verilog code with buffer direction controls This regenerates the verilog code from upstream litex plus a patch to generate outputs from the litesdcard module for controlling bidirectional buffers between the FPGA and SD card. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	264e609fd4	litesdcard: Name targets by vendor.frequency, not just vendor In future we will want to support targets using the same vendor but running at different clock frequencies. Since the clock frequency is a parameter to the gateware generation process, we now name the target directories as "vendor.frequency", i.e., "xilinx.100e6" and "lattice.48e6" rather than "xilinx" and "lattice". Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	e5d64f075d	ECPIX5: Enable FPU and BTC Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	2e8dc3f449	ECPIX-5: Add litedram support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	8e9ec4d1b7	ECPIX-5: Add pin definitions for the PMOD ports Not wired to anything at this point. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	82dacf2c1c	ECPIX-5: Wire up SPI flash The flash chip on my board is an ISSI IS26LP256P chip. The ISSI chip requires slightly different setup for quad mode from the other brands, but works fine with the existing SPI flash interface logic here. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	166e3f4ab2	ECPIX-5: Add basic support Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	14359affbb	litedram: Regenerate gateware and software from recent upstream litedram Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	18911455c6	FPU: Fix fsel instruction to not alter FPSCR (#426 ) The fsel instruction is not supposed to alter FPSCR, but it was clearing FR and FI. Fix this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	af2d6e268e	Merge pull request #425 from paulusmack/fixes Fixes to the FPU and the run_test script	1 year ago
Paul Mackerras	7b86bf8863	tests/fpu: Add tests for fdiv and fre with denormalized operands Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	51954671f3	FPU: Fix behaviour of fdiv with denormalized divisor Renormalization of the divisor for fdiv[s] was adjusting the result exponent in the wrong direction, making the result smaller in magnitude than it should be by a power of 2. Fix this by negating r.shift in the RENORM_B2 state and then subtracting it in the LOOKUP cycle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	59a7996f1c	tests/fpu: Add checks for correct setting of FPRF Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	eecf1ca399	FPU: Fix setting of FPRF The sign recorded in FPRF was sometimes wrong because we weren't doing the modifications that were done in pack_dp when setting FPRF (FPSCR field). These modifications are: set sign for zero result of subtraction based on rounding mode; negate result for fnmadd/sub; but don't modify sign of NaNs. Instead we now do these modifications in the main state machine code and put the result in an 'rsign' variable that is used to set v.res_sign, then r.res_sign is used in the next cycle both for setting FPRF and in the pack_dp functions. That simplifies pack_dp and lets us get rid of r.res_negate, r.res_subtract and r.res_rmode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	9a4f0c18e1	scripts/run_test: Use grep -E instead of egrep Grep in Fedora 39 has started warning when invoked as 'egrep', so use grep -E instead to avoid the warnings. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	fdcb6ec449	Merge pull request #422 from paulusmack/real-icache Icache improvements - use synchronous RAMs and remove 4kB per set limit	2 years ago
Michael Neuling	63d4553fae	Merge pull request #423 from nickg/fix-vunit Fix compatibility with VUnit 5	2 years ago
Nick Gasson	3affa96e28	Fix compatibility with latest VUnit release Signed-off-by: Nick Gasson <nick@nickg.me.uk>	2 years ago
Paul Mackerras	73a2fcbc7f	icache_tb: Update for recent icache changes - Provide next_nia before clock edge where req is asserted - Set rpn and next_rpn to zero - There is no longer an input to the icache from the MMU Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	73b6004ac6	icache: Use next real address to index icache Now that we are translating the fetch effective address to real one cycle earlier, we can use the real address to index the icache array. This has the benefit that the set size can be larger than a page, enabling us to configure the icache to be larger without having to increase its associativity. Previously the set size was limited to the page size to avoid aliasing problems. Thus for example a 32kB icache would need to be 8-way associative, resulting in large numbers of LUTs being used for tag comparisons in FPGA implementations, and poor timing. With this change, a 32kB icache can be 1 or 2-way associative, which means deeper and narrower tag and data RAMs and fewer tag comparators. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	f9e5622327	Move iTLB from icache to fetch1 This moves the address translation step for instruction fetches one cycle earlier, so that it now happens in the fetch1 stage. There is now a 2-entry mini translation cache ("ERAT", or effective to real address translation cache) which operates on the output of the multiplexer that selects the instruction address for the next cycle. The ERAT consists of two effective address registers and two corresponding real address registers. They store the page number part of the addresses for a 4kB page size, which is the smallest page size supported by the architecture. If the effective address doesn't match either of the EA registers, and address translation is enabled, then i_out.req goes low for two cycles while the iTLB is looked up. Experimentally, this delay results in a 0.1% drop in coremark performance; allowing two cycles for the lookup results in better timing. The result from the iTLB is placed into the least recently used ERAT entry and then used to translate the address as normal. If address translation is not enabled then the EA is used directly as the real address. The iTLB structure is the same as it was before; direct mapped, indexed using a hashed EA. The "fetch failed" signal, which indicates a TLB miss or protection violation, is now generated in fetch1 and passed through icache. When it is asserted, fetch1 goes into a stalled state until a PTE arrives from the MMU (which gets put into both the iTLB and the ERAT), or an interrupt or redirect occurs. Any TLB invalidations from the MMU invalidate the whole ERAT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	27c50bc311	Makefile: Remove overriding of ICACHE_NUM_LINES on ECP5 platforms Now that the icache tag RAM is accessed synchronously, the free tools recognize it as block RAM on ECP5-based platforms; thus we no longer need to force it to a very small value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago

1 2 3 4 5 ...

1357 Commits (ba4614c5f4cd6fa56079151a6be44f92790e0b2b) All Branches Search

1357 Commits (ba4614c5f4cd6fa56079151a6be44f92790e0b2b)

All Branches