microwatt/insn_helpers.vhdl

254 lines
8.7 KiB
VHDL

library ieee;
use ieee.std_logic_1164.all;
package insn_helpers is
function insn_rs (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_rt (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ra (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_rb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_rcreg (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_si (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ui (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_l (insn_in : std_ulogic_vector) return std_ulogic;
function insn_sh32 (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_mb32 (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_me32 (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_li (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_lk (insn_in : std_ulogic_vector) return std_ulogic;
function insn_aa (insn_in : std_ulogic_vector) return std_ulogic;
function insn_rc (insn_in : std_ulogic_vector) return std_ulogic;
Add basic XER support The carry is currently internal to execute1. We don't handle any of the other XER fields. This creates type called "xer_common_t" that contains the commonly used XER bits (CA, CA32, SO, OV, OV32). The value is stored in the CR file (though it could be a separate module). The rest of the bits will be implemented as a separate SPR and the two parts reconciled in mfspr/mtspr in latter commits. We always read XER in decode2 (there is little point not to) and send it down all pipeline branches as it will be needed in writeback for all type of instructions when CR0:SO needs to be updated (such forms exist for all pipeline branches even if we don't yet implement them). To avoid having to track XER hazards, we forward it back in EX1. This assumes that other pipeline branches that can modify it (mult and div) are running single issue for now. One additional hazard to beware of is an XER:SO modifying instruction in EX1 followed immediately by a store conditional. Due to our writeback latency, the store will go down the LSU with the previous XER value, thus the stcx. will set CR0:SO using an obsolete SO value. I doubt there exist any code relying on this behaviour being correct but we should account for it regardless, possibly by ensuring that stcx. remain single issue initially, or later by adding some minimal tracking or moving the LSU into the same pipeline as execute. Missing some obscure XER affecting instructions like addex or mcrxrx. [paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of arguments to set_ov] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
function insn_oe (insn_in : std_ulogic_vector) return std_ulogic;
function insn_bd (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bf (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bfa (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_cr (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bt (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ba (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_fxm (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bo (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bi (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bh (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_d (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_ds (insn_in : std_ulogic_vector) return std_ulogic_vector;
core: Implement quadword loads and stores This implements the lq, stq, lqarx and stqcx. instructions. These instructions all access two consecutive GPRs; for example the "lq %r6,0(%r3)" instruction will load the doubleword at the address in R3 into R7 and the doubleword at address R3 + 8 into R6. To cope with having two GPR sources or destinations, the instruction gets repeated at the decode2 stage, that is, for each lq/stq/lqarx/stqcx. coming in from decode1, two instructions get sent out to execute1. For these instructions, the RS or RT register gets modified on one of the iterations by setting the LSB of the register number. In LE mode, the first iteration uses RS|1 or RT|1 and the second iteration uses RS or RT. In BE mode, this is done the other way around. In order for decode2 to know what endianness is currently in use, we pass the big_endian flag down from icache through decode1 to decode2. This is always in sync with what execute1 is using because only rfid or an interrupt can change MSR[LE], and those operations all cause a flush and redirect. There is now an extra column in the decode tables in decode1 to indicate whether the instruction needs to be repeated. Decode1 also enforces the rule that lq with RT = RT and lqarx with RA = RT or RB = RT are illegal. Decode2 now passes a 'repeat' flag and a 'second' flag to execute1, and execute1 passes them on to loadstore1. The 'repeat' flag is set for both iterations of a repeated instruction, and 'second' is set on the second iteration. Execute1 does not take asynchronous or trace interrupts on the second iteration of a repeated instruction. Loadstore1 uses 'next_addr' for the second iteration of a repeated load/store so that we access the second doubleword of the memory operand. Thus loadstore1 accesses the doublewords in increasing memory order. For 16-byte loads this means that the first iteration writes GPR RT|1. It is possible that RA = RT|1 (this is a legal but non-preferred form), meaning that if the memory operand was misaligned, the first iteration would overwrite RA but then the second iteration might take a page fault, leading to corrupted state. To avoid that possibility, 16-byte loads in LE mode take an alignment interrupt if the operand is not 16-byte aligned. (This is the case anyway for lqarx, and we enforce it for lq as well.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
function insn_dq (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_dx (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_to (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_bc (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_sh (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_me (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_mb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_frt (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_fra (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_frb (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_frc (insn_in : std_ulogic_vector) return std_ulogic_vector;
function insn_u (insn_in : std_ulogic_vector) return std_ulogic_vector;
end package insn_helpers;
package body insn_helpers is
function insn_rs (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_rt (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_ra (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_rb (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_rcreg (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_si (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 0);
end;
function insn_ui (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 0);
end;
function insn_l (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(21);
end;
function insn_sh32 (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_mb32 (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_me32 (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(5 downto 1);
end;
function insn_li (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 2);
end;
function insn_lk (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(0);
end;
function insn_aa (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(1);
end;
function insn_rc (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(0);
end;
Add basic XER support The carry is currently internal to execute1. We don't handle any of the other XER fields. This creates type called "xer_common_t" that contains the commonly used XER bits (CA, CA32, SO, OV, OV32). The value is stored in the CR file (though it could be a separate module). The rest of the bits will be implemented as a separate SPR and the two parts reconciled in mfspr/mtspr in latter commits. We always read XER in decode2 (there is little point not to) and send it down all pipeline branches as it will be needed in writeback for all type of instructions when CR0:SO needs to be updated (such forms exist for all pipeline branches even if we don't yet implement them). To avoid having to track XER hazards, we forward it back in EX1. This assumes that other pipeline branches that can modify it (mult and div) are running single issue for now. One additional hazard to beware of is an XER:SO modifying instruction in EX1 followed immediately by a store conditional. Due to our writeback latency, the store will go down the LSU with the previous XER value, thus the stcx. will set CR0:SO using an obsolete SO value. I doubt there exist any code relying on this behaviour being correct but we should account for it regardless, possibly by ensuring that stcx. remain single issue initially, or later by adding some minimal tracking or moving the LSU into the same pipeline as execute. Missing some obscure XER affecting instructions like addex or mcrxrx. [paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of arguments to set_ov] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
function insn_oe (insn_in : std_ulogic_vector) return std_ulogic is
begin
return insn_in(10);
end;
function insn_bd (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 2);
end;
function insn_bf (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 23);
end;
function insn_bfa (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 18);
end;
function insn_cr (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 1);
end;
function insn_bb (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_ba (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_bt (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_fxm (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(19 downto 12);
end;
function insn_bo (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_bi (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_bh (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(12 downto 11);
end;
function insn_d (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 0);
end;
function insn_ds (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 2);
end;
core: Implement quadword loads and stores This implements the lq, stq, lqarx and stqcx. instructions. These instructions all access two consecutive GPRs; for example the "lq %r6,0(%r3)" instruction will load the doubleword at the address in R3 into R7 and the doubleword at address R3 + 8 into R6. To cope with having two GPR sources or destinations, the instruction gets repeated at the decode2 stage, that is, for each lq/stq/lqarx/stqcx. coming in from decode1, two instructions get sent out to execute1. For these instructions, the RS or RT register gets modified on one of the iterations by setting the LSB of the register number. In LE mode, the first iteration uses RS|1 or RT|1 and the second iteration uses RS or RT. In BE mode, this is done the other way around. In order for decode2 to know what endianness is currently in use, we pass the big_endian flag down from icache through decode1 to decode2. This is always in sync with what execute1 is using because only rfid or an interrupt can change MSR[LE], and those operations all cause a flush and redirect. There is now an extra column in the decode tables in decode1 to indicate whether the instruction needs to be repeated. Decode1 also enforces the rule that lq with RT = RT and lqarx with RA = RT or RB = RT are illegal. Decode2 now passes a 'repeat' flag and a 'second' flag to execute1, and execute1 passes them on to loadstore1. The 'repeat' flag is set for both iterations of a repeated instruction, and 'second' is set on the second iteration. Execute1 does not take asynchronous or trace interrupts on the second iteration of a repeated instruction. Loadstore1 uses 'next_addr' for the second iteration of a repeated load/store so that we access the second doubleword of the memory operand. Thus loadstore1 accesses the doublewords in increasing memory order. For 16-byte loads this means that the first iteration writes GPR RT|1. It is possible that RA = RT|1 (this is a legal but non-preferred form), meaning that if the memory operand was misaligned, the first iteration would overwrite RA but then the second iteration might take a page fault, leading to corrupted state. To avoid that possibility, 16-byte loads in LE mode take an alignment interrupt if the operand is not 16-byte aligned. (This is the case anyway for lqarx, and we enforce it for lq as well.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
function insn_dq (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 4);
end;
function insn_dx (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 6) & insn_in(20 downto 16) & insn_in(0);
end;
function insn_to (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_bc (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_sh (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(1) & insn_in(15 downto 11);
end;
function insn_me (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(5) & insn_in(10 downto 6);
end;
function insn_mb (insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(5) & insn_in(10 downto 6);
end;
function insn_frt(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(25 downto 21);
end;
function insn_fra(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(20 downto 16);
end;
function insn_frb(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 11);
end;
function insn_frc(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(10 downto 6);
end;
function insn_u(insn_in : std_ulogic_vector) return std_ulogic_vector is
begin
return insn_in(15 downto 12);
end;
end package body insn_helpers;