You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
microwatt/insn_helpers.vhdl

269 lines
9.2 KiB
VHDL

library ieee;
use ieee.std_logic_1164.all;
package insn_helpers is
function insn_rs (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_rt (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ra (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_rb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_rcreg (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_si (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ui (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_l (insn_in : std_ulogic_vector) return std_ulogic;
function insn_sh32 (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_mb32 (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_me32 (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_li (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_lk (insn_in : std_ulogic_vector) return std_ulogic;
function insn_aa (insn_in : std_ulogic_vector) return std_ulogic;
function insn_rc (insn_in : std_ulogic_vector) return std_ulogic;
Add basic XER support The carry is currently internal to execute1. We don't handle any of the other XER fields. This creates type called "xer_common_t" that contains the commonly used XER bits (CA, CA32, SO, OV, OV32). The value is stored in the CR file (though it could be a separate module). The rest of the bits will be implemented as a separate SPR and the two parts reconciled in mfspr/mtspr in latter commits. We always read XER in decode2 (there is little point not to) and send it down all pipeline branches as it will be needed in writeback for all type of instructions when CR0:SO needs to be updated (such forms exist for all pipeline branches even if we don't yet implement them). To avoid having to track XER hazards, we forward it back in EX1. This assumes that other pipeline branches that can modify it (mult and div) are running single issue for now. One additional hazard to beware of is an XER:SO modifying instruction in EX1 followed immediately by a store conditional. Due to our writeback latency, the store will go down the LSU with the previous XER value, thus the stcx. will set CR0:SO using an obsolete SO value. I doubt there exist any code relying on this behaviour being correct but we should account for it regardless, possibly by ensuring that stcx. remain single issue initially, or later by adding some minimal tracking or moving the LSU into the same pipeline as execute. Missing some obscure XER affecting instructions like addex or mcrxrx. [paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of arguments to set_ov] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
function insn_oe (insn_in : std_ulogic_vector) return std_ulogic;
function insn_bd (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bf (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bfa (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_cr (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bt (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ba (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_fxm (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bo (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bi (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bh (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_d (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ds (insn_in : std_ulogic_vector) return std_ulogic_vector;
core: Implement quadword loads and stores This implements the lq, stq, lqarx and stqcx. instructions. These instructions all access two consecutive GPRs; for example the "lq %r6,0(%r3)" instruction will load the doubleword at the address in R3 into R7 and the doubleword at address R3 + 8 into R6. To cope with having two GPR sources or destinations, the instruction gets repeated at the decode2 stage, that is, for each lq/stq/lqarx/stqcx. coming in from decode1, two instructions get sent out to execute1. For these instructions, the RS or RT register gets modified on one of the iterations by setting the LSB of the register number. In LE mode, the first iteration uses RS|1 or RT|1 and the second iteration uses RS or RT. In BE mode, this is done the other way around. In order for decode2 to know what endianness is currently in use, we pass the big_endian flag down from icache through decode1 to decode2. This is always in sync with what execute1 is using because only rfid or an interrupt can change MSR[LE], and those operations all cause a flush and redirect. There is now an extra column in the decode tables in decode1 to indicate whether the instruction needs to be repeated. Decode1 also enforces the rule that lq with RT = RT and lqarx with RA = RT or RB = RT are illegal. Decode2 now passes a 'repeat' flag and a 'second' flag to execute1, and execute1 passes them on to loadstore1. The 'repeat' flag is set for both iterations of a repeated instruction, and 'second' is set on the second iteration. Execute1 does not take asynchronous or trace interrupts on the second iteration of a repeated instruction. Loadstore1 uses 'next_addr' for the second iteration of a repeated load/store so that we access the second doubleword of the memory operand. Thus loadstore1 accesses the doublewords in increasing memory order. For 16-byte loads this means that the first iteration writes GPR RT|1. It is possible that RA = RT|1 (this is a legal but non-preferred form), meaning that if the memory operand was misaligned, the first iteration would overwrite RA but then the second iteration might take a page fault, leading to corrupted state. To avoid that possibility, 16-byte loads in LE mode take an alignment interrupt if the operand is not 16-byte aligned. (This is the case anyway for lqarx, and we enforce it for lq as well.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
function insn_dq (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_dx (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_to (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bc (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_sh (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_me (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_mb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_frt (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_fra (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_frb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_frc (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_u (insn_in : std_ulogic_vector) return std_ulogic_vector;
Decode prefixed instructions This adds logic to do basic decoding of the prefixed instructions defined in PowerISA v3.1B which are in the SFFS (Scalar Fixed plus Floating-Point Subset) compliancy subset. In PowerISA v3.1B SFFS, there are 14 prefixed load/store instructions plus the prefixed no-op instruction (pnop). The prefixed load/store instructions all use an extended version of D-form, which has an extra 18 bits of displacement in the prefix, plus an 'R' bit which enables PC-relative addressing. When decode1 sees an instruction word where the insn_code is INSN_prefix (i.e. the primary opcode was 1), it stores the prefix word and sends nothing down to decode2 in that cycle. When the next valid instruction word arrives, it is interpreted as a suffix, meaning that its insn_code gets modified before being used to look up the decode table. The insn_code values are rearranged so that the values for instructions which are the suffix of a valid prefixed instruction are all at even indexes, and the corresponding prefixed instructions follow immediately, so that an insn_code value can be converted to the corresponding prefixed value by setting the LSB of the insn_code value. There are two prefixed instructions, pld and pstd, for which the suffix is not a valid SFFS instruction by itself, so these have been given dummy insn_code values which decode as illegal (INSN_op57 and INSN_op61). For a prefixed instruction, decode1 examines the type and subtype fields of the prefix and checks that the suffix is valid for the type and subtype. This check doesn't affect which entry of the decode table is used; the result is passed down to decode2, and will in future be acted upon in execute1. The instruction address passed down to decode2 is the address of the prefix. To enable this, part of the instruction address is saved when the prefix is seen, and then the instruction address received from icache is partly overlaid by the saved prefix address. Because prefixed instructions are not permitted to cross 64-byte boundaries, we only need to save bits 5:2 of the instruction to do this. If the alignment restriction ever gets relaxed, we will then need to save more bits of the address. Decode2 has been extended to handle the R bit of the prefix (in 8LS and MLS forms) and to be able to generate the 34-bit immediate value from the prefix and suffix. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
function insn_prefix_r(prefix : std_ulogic_vector) return std_ulogic;
function insn_prefixed_si(prefix : std_ulogic_vector; suffix : std_ulogic_vector)
return std_ulogic_vector;
end package insn_helpers;
package body insn_helpers is
function insn_rs (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_rt (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_ra (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_rb (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_rcreg (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_si (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 0);
end;
function insn_ui (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 0);
end;
function insn_l (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(21);
end;
function insn_sh32 (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_mb32 (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_me32 (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(5 downto 1);
end;
function insn_li (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 2);
end;
function insn_lk (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(0);
end;
function insn_aa (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(1);
end;
function insn_rc (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(0);
end;
Add basic XER support The carry is currently internal to execute1. We don't handle any of the other XER fields. This creates type called "xer_common_t" that contains the commonly used XER bits (CA, CA32, SO, OV, OV32). The value is stored in the CR file (though it could be a separate module). The rest of the bits will be implemented as a separate SPR and the two parts reconciled in mfspr/mtspr in latter commits. We always read XER in decode2 (there is little point not to) and send it down all pipeline branches as it will be needed in writeback for all type of instructions when CR0:SO needs to be updated (such forms exist for all pipeline branches even if we don't yet implement them). To avoid having to track XER hazards, we forward it back in EX1. This assumes that other pipeline branches that can modify it (mult and div) are running single issue for now. One additional hazard to beware of is an XER:SO modifying instruction in EX1 followed immediately by a store conditional. Due to our writeback latency, the store will go down the LSU with the previous XER value, thus the stcx. will set CR0:SO using an obsolete SO value. I doubt there exist any code relying on this behaviour being correct but we should account for it regardless, possibly by ensuring that stcx. remain single issue initially, or later by adding some minimal tracking or moving the LSU into the same pipeline as execute. Missing some obscure XER affecting instructions like addex or mcrxrx. [paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of arguments to set_ov] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
function insn_oe (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(10);
end;
function insn_bd (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 2);
end;
function insn_bf (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 23);
end;
function insn_bfa (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 18);
end;
function insn_cr (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 1);
end;
function insn_bb (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_ba (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_bt (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_fxm (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(19 downto 12);
end;
function insn_bo (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_bi (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_bh (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(12 downto 11);
end;
function insn_d (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 0);
end;
function insn_ds (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 2);
end;
core: Implement quadword loads and stores This implements the lq, stq, lqarx and stqcx. instructions. These instructions all access two consecutive GPRs; for example the "lq %r6,0(%r3)" instruction will load the doubleword at the address in R3 into R7 and the doubleword at address R3 + 8 into R6. To cope with having two GPR sources or destinations, the instruction gets repeated at the decode2 stage, that is, for each lq/stq/lqarx/stqcx. coming in from decode1, two instructions get sent out to execute1. For these instructions, the RS or RT register gets modified on one of the iterations by setting the LSB of the register number. In LE mode, the first iteration uses RS|1 or RT|1 and the second iteration uses RS or RT. In BE mode, this is done the other way around. In order for decode2 to know what endianness is currently in use, we pass the big_endian flag down from icache through decode1 to decode2. This is always in sync with what execute1 is using because only rfid or an interrupt can change MSR[LE], and those operations all cause a flush and redirect. There is now an extra column in the decode tables in decode1 to indicate whether the instruction needs to be repeated. Decode1 also enforces the rule that lq with RT = RT and lqarx with RA = RT or RB = RT are illegal. Decode2 now passes a 'repeat' flag and a 'second' flag to execute1, and execute1 passes them on to loadstore1. The 'repeat' flag is set for both iterations of a repeated instruction, and 'second' is set on the second iteration. Execute1 does not take asynchronous or trace interrupts on the second iteration of a repeated instruction. Loadstore1 uses 'next_addr' for the second iteration of a repeated load/store so that we access the second doubleword of the memory operand. Thus loadstore1 accesses the doublewords in increasing memory order. For 16-byte loads this means that the first iteration writes GPR RT|1. It is possible that RA = RT|1 (this is a legal but non-preferred form), meaning that if the memory operand was misaligned, the first iteration would overwrite RA but then the second iteration might take a page fault, leading to corrupted state. To avoid that possibility, 16-byte loads in LE mode take an alignment interrupt if the operand is not 16-byte aligned. (This is the case anyway for lqarx, and we enforce it for lq as well.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
function insn_dq (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 4);
end;
function insn_dx (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 6) & insn_in(20 downto 16) & insn_in(0);
end;
function insn_to (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_bc (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_sh (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(1) & insn_in(15 downto 11);
end;
function insn_me (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(5) & insn_in(10 downto 6);
end;
function insn_mb (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(5) & insn_in(10 downto 6);
end;
function insn_frt(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_fra(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_frb(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_frc(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_u(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 12);
end;
Decode prefixed instructions This adds logic to do basic decoding of the prefixed instructions defined in PowerISA v3.1B which are in the SFFS (Scalar Fixed plus Floating-Point Subset) compliancy subset. In PowerISA v3.1B SFFS, there are 14 prefixed load/store instructions plus the prefixed no-op instruction (pnop). The prefixed load/store instructions all use an extended version of D-form, which has an extra 18 bits of displacement in the prefix, plus an 'R' bit which enables PC-relative addressing. When decode1 sees an instruction word where the insn_code is INSN_prefix (i.e. the primary opcode was 1), it stores the prefix word and sends nothing down to decode2 in that cycle. When the next valid instruction word arrives, it is interpreted as a suffix, meaning that its insn_code gets modified before being used to look up the decode table. The insn_code values are rearranged so that the values for instructions which are the suffix of a valid prefixed instruction are all at even indexes, and the corresponding prefixed instructions follow immediately, so that an insn_code value can be converted to the corresponding prefixed value by setting the LSB of the insn_code value. There are two prefixed instructions, pld and pstd, for which the suffix is not a valid SFFS instruction by itself, so these have been given dummy insn_code values which decode as illegal (INSN_op57 and INSN_op61). For a prefixed instruction, decode1 examines the type and subtype fields of the prefix and checks that the suffix is valid for the type and subtype. This check doesn't affect which entry of the decode table is used; the result is passed down to decode2, and will in future be acted upon in execute1. The instruction address passed down to decode2 is the address of the prefix. To enable this, part of the instruction address is saved when the prefix is seen, and then the instruction address received from icache is partly overlaid by the saved prefix address. Because prefixed instructions are not permitted to cross 64-byte boundaries, we only need to save bits 5:2 of the instruction to do this. If the alignment restriction ever gets relaxed, we will then need to save more bits of the address. Decode2 has been extended to handle the R bit of the prefix (in 8LS and MLS forms) and to be able to generate the 34-bit immediate value from the prefix and suffix. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1 year ago
function insn_prefix_r(prefix : std_ulogic_vector) return std_ulogic is
begin
return prefix(20);
end;
function insn_prefixed_si(prefix : std_ulogic_vector; suffix : std_ulogic_vector)
return std_ulogic_vector is
begin
return prefix(17 downto 0) & suffix(15 downto 0);
end;
end package body insn_helpers;