microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	d2bf3f3580	core: Implement hypervisor doorbell interrupt and msg* instructions This implements the hypervisor doorbell exception and interrupt and the msgsnd, msgclr and msgsync instructions (msgsync is a no-op). The msgsnd instruction can generate a hypervisor doorbell interrupt on any CPU in the system. To achieve this, each core sends its hypervisor doorbell messages to the soc level, which ORs together the bits for each CPU and sends it to that CPU. The privileged doorbell exception/interrupt and the msgsndp/msgclrp instructions are not required since we don't implement SMT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	ca872faede	core: Consolidate several OP_* values into a single OP_COMPUTE This replaces OP_ADDG6S, OP_BCD, OP_BREV, OP_CMPB, OP_CMPEQB, OP_CMPRB, OP_CROP, OP_EXTS, OP_EXTSWSLI, OP_ISEL, OP_LOGIC, OP_MFCR, OP_PRTY, OP_RLC, OP_RLCL, OP_RLCR, OP_SETB, OP_SHL, OP_SHR, and OP_XOR with a single OP_COMPUTE. The replaced operations are all ones which just compute a result value (for GPR or CR) in execute1, don't have any other side effects, and aren't used in decode2 to determine other signals. The operation to be performed is sufficiently defined by the result and subresult fields in the decode table. With the elimination of OP_SPARE, this reduces the number of insn_type_t values to 44. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	9 months ago
Paul Mackerras	fa9df33f7e	Implement cfuged, pdepd and pextd This implements the cfuged, pdepd and pextd instructions in a new unit called bit_sorter (so called because cfuged and pextd can be viewed as sorting the bits of the mask). The cnt* instructions and the popcnt* instructions now use the same OP_COUNTB insn_type so as to free up an insn_type value to use for the new instructions. The new instructions are implemented using a slow and simple algorithm that takes 64 cycles to compute the result. The ex1 stage is stalled while this happens, as for a 64-bit multiply, or for a divide when there is no FPU. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	205c0e2c78	Implement the wait instruction This implements the behaviour of the 'wait 0' instruction of pausing execution of instructions until an exception arises. The exceptions that terminate a wait are a pending trace exception, external interrupt request, PMU interrupt request, or decrementer negative exception. These exception conditions terminate a wait even if not enabled to generate an interrupt (e.g. if MSR[EE] is zero). This is implemented by having execute1 assert its busy_out signal while the wait state exists. The wait state is set by the completion of the wait instruction and cleared by a pending exception. If the WC operand of the wait instruction is non-zero, indicating wait for reservation loss or wait for a short period, then the wait instruction does not wait, but just acts as a no-op. In order to make space in the insn_type_t type without going over 64 elements, this combines OP_DCBT and OP_ICBT into a single OP_XCBT, since they were both no-ops (except for their influence on how SRR1 is set on a trace interrupt, where they were identical). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	af62b9f1eb	scripts/fmt_log: Update for recent changes This updates fmt_log.c to account for the recent changes to insn_type_t and to unit_t. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	6db626d245	icache: Log 36 bits of instruction rather than 32 This expands the field in the log buffer that stores the instruction fetched from the icache to 36 bits, so that we get the insn_code and illegal instruction indication. To do this, we reclaim 3 unused bits from execute1's portion and one other unused bit (previously just set to 0 in core.vhdl). This also alters the trigger behaviour to stop after one quarter of the log buffer has been filled with samples after the trigger, or 256 entries, whichever is less. This is to ensure that the trigger event doesn't get overwritten when the log buffer is small. This updates fmt_log to the new log format. Valid instructions are printed as a decimal insn_code value followed by the bottom 26 bits of the instruction. Illegal instructions are printed as "ill" followed by the full 32 bits of the instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	932da4c114	FPU: Simplify IDLE state code Do more decoding of the instruction ahead of the IDLE state processing so that the IDLE state code becomes much simpler. To make the decoding easier, we now use four insn_type_t codes for floating-point operations rather than two. This also rearranges the insn_type_t values a little to get the 4 FP opcode values to differ only in the bottom 2 bits, and put OP_DIV, OP_DIVE and OP_MOD next to them. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	89a67a18d0	decode: Add a facility field to the instruction decode tables This makes it simpler to work out when to deliver a FPU unavailable interrupt. This also means we can get rid of the OP_FPLOAD and OP_FPSTORE insn_type values. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	856e9e955f	core: Add framework for an FPU This adds the skeleton of a floating-point unit and implements the mffs and mtfsf instructions. Execute1 sends FP instructions to the FPU and receives busy, exception, FP interrupt and illegal interrupt signals from it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	45cd8f4fc3	core: Add support for floating-point loads and stores This extends the register file so it can hold FPR values, and implements the FP loads and stores that do not require conversion between single and double precision. We now have the FP, FE0 and FE1 bits in MSR. FP loads and stores cause a FP unavailable interrupt if MSR[FP] = 0. The FPU facilities are optional and their presence is controlled by the HAS_FPU generic passed down from the top-level board file. It defaults to true for all except the A7-35 boards. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	83816cb9e3	core: Implement BCD Assist instructions addg6s, cdtbcd, cbcdtod To avoid adding too much logic, this moves the adder used by OP_ADD out of the case statement in execute1.vhdl so that the result can be used by OP_ADDG6S as well. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	290b05f97d	core: Implement the maddhd, maddhdu and maddld instructions These instructions use major opcode 4 and have a third GPR input operand, so we need a decode table for major opcode 4 and some plumbing to get the RC register operand read. The multiply-add instructions use the same insn_type_t values as the regular multiply instructions, and we distinguish in execute1 by looking at the major opcode. This turns out to be convenient because we don't have to add any cases in the code that handles the output of the multiplier, and it frees up some insn_type_t values. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	fa77a6f683	core: Implement the mcrxrx instruction This also removes OP_MCRXR, as the mcrxr instruction was removed in version 3.0B of the Power ISA, having been phased-out for the server architecture since v2.02. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	49a4d9f67a	Add core logging This logs 256 bits of data per cycle to a ring buffer in BRAM. The data collected can be read out through 2 new SPRs or through the debug interface. The new SPRs are LOG_ADDR (724) and LOG_DATA (725). LOG_ADDR contains the buffer write pointer in the upper 32 bits (in units of entries, i.e. 32 bytes) and the read pointer in the lower 32 bits (in units of doublewords, i.e. 8 bytes). Reading LOG_DATA gives the doubleword from the buffer at the read pointer and increments the read pointer. Setting bit 31 of LOG_ADDR inhibits the trace log system from writing to the log buffer, so the contents are stable and can be read. There are two new debug addresses which function similarly to the LOG_ADDR and LOG_DATA SPRs. The log is frozen while either or both of the LOG_ADDR SPR bit 31 or the debug LOG_ADDR register bit 31 are set. The buffer defaults to 2048 entries, i.e. 64kB. The size is set by the LOG_LENGTH generic on the core_debug module. Software can determine the length of the buffer because the length is ORed into the buffer write pointer in the upper 32 bits of LOG_ADDR. Hence the length of the buffer can be calculated as 1 << (31 - clz(LOG_ADDR)). There is a program to format the log entries in a somewhat readable fashion in scripts/fmt_log/fmt_log.c. The log_entry struct in that file describes the layout of the bits in the log entries. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago

14 Commits (master)