Compare commits

..

64 Commits

Author SHA1 Message Date
Paul Mackerras 35e0dbed34
Merge pull request #353 from tianrui-wei/master
fix: fix icache_tb not finishing correctly
2 years ago
Michael Neuling cd52390bf1
Merge pull request #373 from antonblanchard/icache-insn-u-state
icache: Don't output X on i_out.insn
2 years ago
Michael Neuling b983d5080e
Merge pull request #376 from antonblanchard/loadstore-init
loadstore1: reduce U state being output
2 years ago
Michael Neuling d4db331467
Merge pull request #374 from antonblanchard/icache-unused-sig
core: Remove unused icache_inv signal
2 years ago
Michael Neuling ee5e3778ed
Merge pull request #364 from shenki/readme-updates
Readme updates
2 years ago
Michael Neuling c43692f4c7
Merge pull request #372 from antonblanchard/dcache-unused-sig
dcache: remove unused do_write signal
2 years ago
Michael Neuling 956df2c863
Merge pull request #371 from antonblanchard/unused-sig
execute1: sub_mux_sel and result_mux_sel are unused
2 years ago
Michael Neuling 3627f102db
Merge pull request #370 from antonblanchard/divider-init
divider: Fix d_out.overflow U state issue
2 years ago
Paul Mackerras 6e1e763c02
Merge pull request #368 from antonblanchard/icache-pmu-events
icache: Hook up PMU events
2 years ago
Anton Blanchard 1047239a37
Merge pull request #377 from antonblanchard/fpu-init
fpu: Reduce uninitialised signals
2 years ago
Anton Blanchard 9d35340bb1 fpu: Reduce uninitialised signals
Reduce uninitialised signals coming out of the FPU.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling b82eea5933
Merge pull request #366 from antonblanchard/hello-world-bss
Zero BSS in hello world test
2 years ago
Anton Blanchard d3aff67fa7
Merge pull request #375 from antonblanchard/core_debug-init
core_debug: Initialise gspr_index
2 years ago
Anton Blanchard b47b71821e loadstore1: reduce U state being output
While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 71d4b5ed20 core_debug: Initialise gspr_index
Another case of U state being driven out of a module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard a527d9b959 core: Remove unused icache_inv signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard e7f0a7c7ac icache: Don't output X on i_out.insn
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 39220be311 dcache: remove unused do_write signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 843361f2be execute1: sub_mux_sel and result_mux_sel are unused
Remove them.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard d3a7517318 divider: Fix d_out.overflow U state issue
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 1ff852b012
Merge pull request #369 from antonblanchard/loadstore-pmu-init
loadstore1: Initialise PMU events
2 years ago
Anton Blanchard e2438071a1 loadstore1: Initialise PMU events
The loadstore1 PMU events are U state until a load and a store completes.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard b7c4d3c5c3
Merge pull request #367 from antonblanchard/fpu-typo
fpu: Fix capitalisation of Execute1ToFPUType
2 years ago
Anton Blanchard f06abb67ad icache: Hook up PMU events
We weren't connecting the icache PMU events up.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 64d2def0c6 fpu: Fix capitalisation of Execute1ToFPUType
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard ff442d1bdb Zero BSS in hello world test
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard b8fc5636a4
Merge pull request #365 from antonblanchard/less-fpga-init
Remove some FPGA style signal inits
2 years ago
Anton Blanchard ebdddcc402 Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard a750365ffa Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Joel Stanley 9ec22af256 README: Add Linux on Microwatt instructions
These instructions are similar to those at

 https://ozlabs.org/~joel/microwatt/README

except they describe how to build the artifacts from scratch instead of
downloading them.

Signed-off-by: Joel Stanley <joel@jms.id.au>
2 years ago
Joel Stanley a31725d989 README: Add uart to fusesoc instructions
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.

Also remove the text about a working directory; fusesoc doesn't need
one.

Signed-off-by: Joel Stanley <joel@jms.id.au>
2 years ago
Michael Neuling f5e06c2d4b
Merge pull request #361 from antonblanchard/alt-reset-address
Allow ALT_RESET_ADDRESS to be overridden
2 years ago
Anton Blanchard 948f6f43a7 Allow ALT_RESET_ADDRESS to be overridden
This allows us to boot from flash for example.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling 8bf48ac094
Merge pull request #360 from antonblanchard/log2ceil-issue
wishbone_bram_wrapper ram_addr_bits is 1 bit off
2 years ago
Anton Blanchard b5accb78b2 wishbone_bram_wrapper ram_addr_bits is 1 bit off
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.

Every other user of log2ceil() gets this right.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling 30fd936c12
Merge pull request #358 from antonblanchard/unused-sig
Remove unused sequential signal from Fetch1ToIcacheType
2 years ago
Michael Neuling af1b76d944
Merge pull request #356 from antonblanchard/fpu-constant
fpu: Make inverse_table a constant
2 years ago
Michael Neuling 9b96ab730c
Merge pull request #357 from antonblanchard/xics-warning
xics: Fix warning when comparing two std_ulogic_vectors
2 years ago
Anton Blanchard 0b39947f8d Remove unused sequential signal from Fetch1ToIcacheType
GHDL synthesis is flagging a warning about this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 00bf0af21c xics: Fix warning when comparing two std_ulogic_vectors
Use unsigned() to make it clear what we are doing.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 50b4cb9423 fpu: Make inverse_table a constant
GHDL synthesis is complaining that inverse_table is never stored to.
Change it to a constant.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Tianrui Wei 844ca0e6b5
fix: fix icache_tb not finishing correctly
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.

Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
2 years ago
Michael Neuling f01f3d233a
Merge pull request #352 from mkj/static-urjtag
mw_debug: Add STATIC_URJTAG flag
2 years ago
Matt Johnston c0c00d05bc mw_debug: Add STATIC_URJTAG flag
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`

Fixes #351

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Michael Neuling ffcdaaa92d
Update the README Issues (#350)
We've had these for a while now:
 - D/I cache
 - GPR bypassing
 - Supervisor state (and can boot linux)

We still need Vector/VMX/VSX (and probably some other things)

Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago
Michael Neuling b4770197a2
Merge pull request #349 from madscientist159/master
Extend LiteDRAM VHDL wrapper to allow more than one clock line
2 years ago
Raptor Engineering Development Team fcb783a0fb Extend LiteDRAM VHDL wrapper to allow more than one clock line
This is necessary for the upcoming Arctic Tern system enablement,
since Arctic Tern uses two DRAM devices and a separate clock line
is routed to each device.  LiteX handles this behavior correctly,
therefore we assume other hardware exists that uses a similar
DRAM clock design.

Updates from Mikey to fix some compile issues.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago
Michael Neuling 2b97fb0bf3
Merge pull request #348 from paulusmack/reduce
Reduce LUT usage
2 years ago
Paul Mackerras 0aa898c7a6 xics: Rework the irq_gen process
At present, the loop in the irq_gen process generates a chain of
comparators and other logic to work out the source number and priority
of the most-favoured (lowest priority number) pending interrupt.
This replaces that chain with (1) logic to generate an array of bits,
one per priority, indicating whether any interrupt is pending at that
priority, (2) a priority encoder to select the most favoured priority
with an interrupt pending, (3) logic to generate an array of bits, one
per source, indicating whether an interrupt is pending at the priority
calculated in step 2, and (4) a priority encoder to work out the
lowest numbered source that has an interrupt pending at the selected
priority.  This reduces LUT utilization.

The priority encoder function implemented here uses the optimized
count-leading-zeroes logic from helpers.vhdl.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 1720a0584a Use alternative count-leading-zeroes algorithm in the FPU and LSU
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 1086988883 countzero: Use alternative algorithm for higher bits
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.

By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB.  This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits.  Bit 4 of the result requires
looking at bits 63, 47, 31 and 15.  In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.

It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5.  This saves almost 60 6-input LUTs on the Artix-7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 4cf2921b0b soc: Re-do peripheral address decode to improve timing
This generates a series of io_cycle_* signals which are clean latches
and which become the 'cyc' signals of the wishbone buses going to
various peripherals (syscon, uarts, XICS, GPIO, etc.).  Effectively
this is done by moving the address decoding into the slave_io_latch
process.  The slave_io_type, which drives the multiplexer which
selects which wishbone to look for a response on, is reduced to just 8
values in the expectation that an 8-way multiplexer will use less
logic than one with more than 8 inputs.

With this timing is considerably better on the A7-100T.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 27b660ef76
Merge pull request #346 from mkj/dmi_ecp5
Add DMI and mw_debug for ECP5
2 years ago
Anton Blanchard 5a5a082601
Merge pull request #343 from mikey/orange-crab-ci
ci: Add new Orange Crab build
2 years ago
Matt Johnston 9c64f8a98b mw_debug: Add Lattice ECP5 support
"-b ecp5" will select ECP5 interface that talks to a JTAGG
primitive.

For example with a FT232H JTAG board:

./mw_debug  -t 'ft2232 vid=0x0403 pid=0x6014'  -s 30000000 -b ecp5 mr ff003888 6
Connected to libftdi driver.
Found device ID: 0x41113043
00000000ff003888: 6d6f636c65570a0a  ..Welcom
00000000ff003890: 63694d206f742065  e to Mic
00000000ff003898: 2120747461776f72  rowatt !
00000000ff0038a0: 0000000000000a0a  ........
00000000ff0038a8: 67697320636f5320   Soc sig
00000000ff0038b0: 203a65727574616e  nature:
Core: running
 NIA: c0000000000187f8
 MSR: 9000000000001033

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Matt Johnston 3775650df3 dmi_dtm_ecp5: Use ECP5 JTAGG for DMI
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087

Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Matt Johnston eb20195a10 mw_debug: Link urjtag statically
liburjtag isn't in Debian, so usually we're pointing at a urjtag
build directory when building mw_debug

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Matt Johnston 763138798e mw_debug: use isxdigit for hex arguments
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Matt Johnston 04cc4a842c mw_debug: Add -s frequency argument
Chose -s for speed, vs -f for --force

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Matt Johnston e05ae0c8cb mw_debug: pass target parameters to urjtag
An example

./mw_debug -d -t 'ft2232 vid=0x0403 pid=0x6014'

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
2 years ago
Paul Mackerras 49ec80ac3e fetch1/icache1: Remove the use_previous logic
This removes logic that I added some time ago with the thought that it
would enable us to do prefetching in the icache.  This logic detects
when the fetch address is an odd multiple of 4 and the next address in
sequence from the previous cycle.  In that case the instruction we
want is in the output register of the icache RAM already so there is
no need to do another read or any icache tag or TLB lookup.

However, this logic adds complexity, and removing it improves timing,
so this removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras cef3660e74
Merge pull request #345 from antonblanchard/popcnt-go-fast
popcnt* timing improvements from Paul
2 years ago
Paul Mackerras 2491aa7fc5 core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result.  The motivation for this is that the popcnt*
calculation was showing up as a critical path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Michael Neuling 286757f0f7 ci: Add new Orange Crab build
This builds the Orange Crab v0.21 + litedram image

Signed-off-by: Michael Neuling <mikey@neuling.org>
2 years ago

@ -72,7 +72,7 @@ jobs:
fail-fast: false
max-parallel: 2
matrix:
task: [ ECP5-EVN, ORANGE-CRAB ]
task: [ ECP5-EVN, ORANGE-CRAB, ORANGE-CRAB-0.21 ]
runs-on: ubuntu-latest
env:
DOCKER: 1

@ -55,18 +55,15 @@ all = core_tb icache_tb dcache_tb dmi_dtm_tb \

all: $(all)

base_core_files = decode_types.vhdl common.vhdl wishbone_types.vhdl fetch1.vhdl \
utils.vhdl plru.vhdl icache.vhdl \
core_files = decode_types.vhdl common.vhdl wishbone_types.vhdl fetch1.vhdl \
utils.vhdl plru.vhdl cache_ram.vhdl icache.vhdl \
decode1.vhdl helpers.vhdl insn_helpers.vhdl \
control.vhdl decode2.vhdl \
control.vhdl decode2.vhdl register_file.vhdl \
cr_file.vhdl crhelpers.vhdl ppc_fx_insns.vhdl rotator.vhdl \
logical.vhdl countzero.vhdl divider.vhdl execute1.vhdl \
logical.vhdl countbits.vhdl multiply.vhdl divider.vhdl execute1.vhdl \
loadstore1.vhdl mmu.vhdl dcache.vhdl writeback.vhdl core_debug.vhdl \
core.vhdl fpu.vhdl pmu.vhdl

core_files = $(base_core_files) register_file.vhdl cache_ram.vhdl multiply.vhdl
asic_core_files = $(base_core_files) asic/register_file.vhdl asic/cache_ram.vhdl asic/multiply.vhdl

soc_files = wishbone_arbiter.vhdl wishbone_bram_wrapper.vhdl sync_fifo.vhdl \
wishbone_debug_master.vhdl xics.vhdl syscon.vhdl gpio.vhdl soc.vhdl \
spi_rxtx.vhdl spi_flash_ctrl.vhdl
@ -163,7 +160,7 @@ ICACHE_NUM_LINES=4

clkgen=fpga/clk_gen_ecp5.vhd
toplevel=fpga/top-generic.vhdl
dmi_dtm=dmi_dtm_jtag.vhdl dmi_dtm_dummy.vhdl
dmi_dtm=dmi_dtm_dummy.vhdl
LITEDRAM_GHDL_ARG=

# OrangeCrab with ECP85 (original v0.0 with UM5G-85 chip)
@ -195,6 +192,7 @@ ECP_FLASH_OFFSET=0x80000
toplevel=fpga/top-orangecrab0.2.vhdl
litedram_target=orangecrab-85-0.2
soc_extra_v += litesdcard/generated/lattice/litesdcard_core.v
dmi_dtm=dmi_dtm_ecp5.vhdl
endif

# ECP5-EVN
@ -220,6 +218,7 @@ GHDL_IMAGE_GENERICS=-gMEMORY_SIZE=$(MEMORY_SIZE) -gRAM_INIT_FILE=$(RAM_INIT_FILE
-gRESET_LOW=$(RESET_LOW) -gCLK_INPUT=$(CLK_INPUT) -gCLK_FREQUENCY=$(CLK_FREQUENCY) -gICACHE_NUM_LINES=$(ICACHE_NUM_LINES) \
$(LITEDRAM_GHDL_ARG)


ifeq ($(FPGA_TARGET), verilator)
RESET_LOW=true
CLK_INPUT=50000000
@ -227,41 +226,22 @@ CLK_FREQUENCY=50000000
clkgen=fpga/clk_gen_bypass.vhd
endif

ifeq ($(FPGA_TARGET), caravel)
MEMORY_SIZE=4096
RESET_LOW=true
CLK_INPUT=100000000
CLK_FREQUENCY=100000000
endif

fpga_files = fpga/soc_reset.vhdl \
fpga/pp_fifo.vhd fpga/pp_soc_uart.vhd fpga/main_bram.vhdl \
nonrandom.vhdl

asic_files = fpga/pp_fifo.vhd fpga/pp_soc_uart.vhd asic/main_bram.vhdl \
asic/top-asic.vhdl $(dmi_dtm) nonrandom.vhdl

synth_files = $(core_files) $(soc_files) $(soc_extra_synth) $(fpga_files) $(clkgen) $(toplevel) $(dmi_dtm)

asic_synth_files = $(asic_core_files) $(soc_files) $(asic_files)

microwatt.json: $(synth_files) $(RAM_INIT_FILE)
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; read_verilog $(uart_files) $(soc_extra_v); synth_ecp5 -abc9 -nowidelut -json $@ $(SYNTH_ECP5_FLAGS)"

microwatt.v: $(synth_files) $(RAM_INIT_FILE)
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; write_verilog $@"

microwatt_asic.v: $(asic_synth_files)
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(asic_synth_files) -e toplevel; write_verilog $@"

microwatt-verilator: microwatt.v verilator/microwatt-verilator.cpp verilator/uart-verilator.c
$(VERILATOR) $(VERILATOR_FLAGS) -CFLAGS "$(VERILATOR_CFLAGS) -DCLK_FREQUENCY=$(CLK_FREQUENCY)" -Iuart16550 --assert --cc --exe --build $^ -o $@ -top-module toplevel
@cp -f obj_dir/microwatt-verilator microwatt-verilator

microwatt_asic-verilator: microwatt_asic.v asic/microwatt_asic-verilator.cpp verilator/uart-verilator.c verilator/jtag-verilator.c
$(VERILATOR) $(VERILATOR_FLAGS) -CFLAGS "$(VERILATOR_CFLAGS) -DCLK_FREQUENCY=$(CLK_FREQUENCY)" -Iuart16550 -Iasic/behavioural -Ijtag_tap --assert --cc --exe --build $^ -o $@ -top-module toplevel
@cp -f obj_dir/microwatt_asic-verilator microwatt_asic-verilator

microwatt_out.config: microwatt.json $(LPF)
$(NEXTPNR) --json $< --lpf $(LPF) --textcfg $@.tmp $(NEXTPNR_FLAGS) --package $(PACKAGE)
mv -f $@.tmp $@
@ -344,7 +324,6 @@ _clean:
rm -f scripts/mw_debug/mw_debug
rm -f microwatt.bin microwatt.json microwatt.svf microwatt_out.config
rm -f microwatt.v microwatt-verilator
rm -f microwatt_asic.v microwatt_asic-verilator
rm -rf obj_dir/

clean: _clean

@ -103,14 +103,8 @@ sudo dnf install fusesoc

```
fusesoc init
```

- Create a working directory and point FuseSoC at microwatt:

```
mkdir microwatt-fusesoc
cd microwatt-fusesoc
fusesoc library add microwatt /path/to/microwatt/
fusesoc fetch uart16550
fusesoc library add microwatt /path/to/microwatt
```

- Build using FuseSoC. For hello world (Replace nexys_video with your FPGA board such as --target=arty_a7-100):
@ -128,6 +122,68 @@ You should then be able to see output via the serial port of the board (/dev/tty
fusesoc run --target=nexys_video microwatt
```

## Linux on Microwatt

Mainline Linux supports Microwatt as of v5.14. The Arty A7 is the best tested
platform, but it's also been tested on the OrangeCrab and ButterStick.

1. Use buildroot to create a userspace

A small change is required to glibc in order to support the VMX/AltiVec-less
Microwatt, as float128 support is mandiatory and for this in GCC requires
VSX/AltiVec. This change is included in Joel's buildroot fork, along with a
defconfig:
```
git clone -b microwatt https://github.com/shenki/buildroot
cd buildroot
make ppc64le_microwatt_defconfig
make
```

The output is `output/images/rootfs.cpio`.

2. Build the Linux kernel
```
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
make ARCH=powerpc microwatt_defconfig
make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- \
CONFIG_INITRAMFS_SOURCE=/buildroot/output/images/rootfs.cpio -j`nproc`
```

The output is `arch/powerpc/boot/dtbImage.microwatt.elf`.

3. Build gateware using FuseSoC

First configure FuseSoC as above.
```
fusesoc run --build --target=arty_a7-100 microwatt --no_bram --memory_size=0
```

The output is `build/microwatt_0/arty_a7-100-vivado/microwatt_0.bit`.

4. Program the flash

This operation will overwrite the contents of your flash.

For the Arty A7 A100, set `FLASH_ADDRESS` to `0x400000` and pass `-f a100`.

For the Arty A7 A35, set `FLASH_ADDRESS` to `0x300000` and pass `-f a35`.
```
microwatt/openocd/flash-arty -f a100 build/microwatt_0/arty_a7-100-vivado/microwatt_0.bit
microwatt/openocd/flash-arty -f a100 dtbImage.microwatt.elf -t bin -a $FLASH_ADDRESS
```

5. Connect to the second USB TTY device exposed by the FPGA

```
minicom -D /dev/ttyUSB1
```

The gateware has firmware that will look at `FLASH_ADDRESS` and attempt to
parse an ELF there, loading it to the address specified in the ELF header
and jumping to it.

## Testing

- A simple test suite containing random execution test cases and a couple of
@ -139,8 +195,5 @@ make -j$(nproc) check

## Issues

This is functional, but very simple. We still have quite a lot to do:

- There are a few instructions still to be implemented
- Need to add caches and bypassing (in progress)
- Need to add supervisor state (in progress)
- There are a few instructions still to be implemented:
- Vector/VMX/VSX

@ -1,24 +0,0 @@
module Microwatt_FP_DFFRFile (
`ifdef USE_POWER_PINS
inout VPWR,
inout VGND,
`endif
input [6:0] R1, R2, R3, RW,
input [63:0] DW,
output [63:0] D1, D2, D3,
input CLK,
input WE
);

reg [63:0] registers[0:79];

assign D1 = registers[R1];
assign D2 = registers[R2];
assign D3 = registers[R3];

always @(posedge CLK) begin
if (WE)
registers[RW] <= DW;
end

endmodule

@ -1,40 +0,0 @@
module RAM32_1RW1R #(
parameter BITS=5
) (
`ifdef USE_POWER_PINS
inout VPWR,
inout VGND,
`endif
input CLK,

input EN0,
input [BITS-1:0] A0,
input [7:0] WE0,
input [63:0] Di0,
output reg [63:0] Do0,

input EN1,
input [BITS-1:0] A1,
output reg [63:0] Do1
);

reg [63:0] RAM[2**BITS-1:0];

always @(posedge CLK) begin
if (EN1)
Do1 <= RAM[A1];
end

generate
genvar i;
for (i=0; i<8; i=i+1) begin: BYTE
always @(posedge CLK) begin
if (EN0) begin
if (WE0[i])
RAM[A0][i*8+7:i*8] <= Di0[i*8+7:i*8];
end
end
end
endgenerate

endmodule

@ -1,42 +0,0 @@
module RAM512 #(
parameter BITS=9,
parameter FILENAME="firmware.hex"
) (
`ifdef USE_POWER_PINS
inout VPWR,
inout VGND,
`endif
input CLK,
input [7:0] WE0,
input EN0,
input [63:0] Di0,
output reg [63:0] Do0,
input [BITS-1:0] A0
);

reg [63:0] RAM[2**BITS-1:0];

always @(posedge CLK) begin
if (EN0)
Do0 <= RAM[A0];
else
Do0 <= 64'b0;
end

generate
genvar i;
for (i=0; i<8; i=i+1) begin: BYTE
always @(posedge CLK) begin
if (EN0) begin
if (WE0[i])
RAM[A0][i*8+7:i*8] <= Di0[i*8+7:i*8];
end
end
end
endgenerate

initial begin
$readmemh(FILENAME, RAM);
end

endmodule

@ -1,24 +0,0 @@
module multiply_add_64x64
#(
parameter BITS=64
) (
`ifdef USE_POWER_PINS
inout VPWR,
inout VGND,
`endif
input clk,
input [BITS-1:0] a,
input [BITS-1:0] b,
input [BITS*2-1:0] c,
output [BITS*2-1:0] o
);
reg [BITS*2-1:0] o_tmp[2:0];

always @(posedge clk) begin
o_tmp[2] = o_tmp[1];
o_tmp[1] = o_tmp[0];
o_tmp[0] = (a * b) + c;
end

assign o = o_tmp[2];
endmodule

@ -1,99 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;

entity cache_ram is
generic(
ROW_BITS : integer := 5;
WIDTH : integer := 64;
TRACE : boolean := false;
ADD_BUF : boolean := false
);

port(
clk : in std_logic;

rd_en : in std_logic;
rd_addr : in std_logic_vector(ROW_BITS - 1 downto 0);
rd_data : out std_logic_vector(WIDTH - 1 downto 0);

wr_sel : in std_logic_vector(WIDTH/8 - 1 downto 0);
wr_addr : in std_logic_vector(ROW_BITS - 1 downto 0);
wr_data : in std_logic_vector(WIDTH - 1 downto 0)
);

end cache_ram;

architecture rtl of cache_ram is
component RAM32_1RW1R port(
CLK : in std_logic;

EN0 : in std_logic;
A0 : in std_logic_vector(4 downto 0);
WE0 : in std_logic_vector(7 downto 0);
Di0 : in std_logic_vector(63 downto 0);
Do0 : out std_logic_vector(63 downto 0);

EN1 : in std_logic;
A1 : in std_logic_vector(4 downto 0);
Do1 : out std_logic_vector(63 downto 0)
);
end component;

signal wr_enable: std_logic;
signal rd_data0_tmp : std_logic_vector(WIDTH - 1 downto 0);
signal rd_data0_saved : std_logic_vector(WIDTH - 1 downto 0);
signal rd_data0 : std_logic_vector(WIDTH - 1 downto 0);
signal rd_en_prev: std_ulogic;
begin
assert (ROW_BITS = 5) report "ROW_BITS must be 5" severity FAILURE;
assert (WIDTH = 64) report "Must be 64 bit" severity FAILURE;
assert (TRACE = false) report "Trace not supported" severity FAILURE;

wr_enable <= or(wr_sel);

cache_ram_0 : RAM32_1RW1R
port map (
CLK => clk,

EN0 => wr_enable,
A0 => wr_addr,
WE0 => wr_sel,
Di0 => wr_data,
Do0 => open,

EN1 => rd_en,
A1 => rd_addr,
Do1 => rd_data0_tmp
);

-- The caches rely on cache_ram latching the last read. Handle it here
-- for now.
process(clk)
begin
if rising_edge(clk) then
rd_en_prev <= rd_en;
if rd_en_prev = '1' then
rd_data0_saved <= rd_data0_tmp;
end if;
end if;
end process;
rd_data0 <= rd_data0_tmp when rd_en_prev = '1' else rd_data0_saved;

buf: if ADD_BUF generate
begin
process(clk)
begin
if rising_edge(clk) then
rd_data <= rd_data0;
end if;
end process;
end generate;

nobuf: if not ADD_BUF generate
begin
rd_data <= rd_data0;
end generate;

end architecture rtl;

@ -1,63 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;

library work;

entity main_bram is
generic(
WIDTH : natural := 64;
HEIGHT_BITS : natural := 11;
MEMORY_SIZE : natural := (8*1024);
RAM_INIT_FILE : string
);
port(
clk : in std_logic;
addr : in std_logic_vector(HEIGHT_BITS - 1 downto 0) ;
din : in std_logic_vector(WIDTH-1 downto 0);
dout : out std_logic_vector(WIDTH-1 downto 0);
sel : in std_logic_vector((WIDTH/8)-1 downto 0);
re : in std_ulogic;
we : in std_ulogic
);
end entity main_bram;

architecture behaviour of main_bram is
component RAM512 port (
CLK : in std_ulogic;
WE0 : in std_ulogic_vector(7 downto 0);
EN0 : in std_ulogic;
Di0 : in std_ulogic_vector(63 downto 0);
Do0 : out std_ulogic_vector(63 downto 0);
A0 : in std_ulogic_vector(8 downto 0)
);
end component;

signal sel_qual: std_ulogic_vector((WIDTH/8)-1 downto 0);

signal obuf : std_logic_vector(WIDTH-1 downto 0);
begin
assert (WIDTH = 64) report "Must be 64 bit" severity FAILURE;
-- Do we have a log2 round up issue here?
assert (HEIGHT_BITS = 10) report "HEIGHT_BITS must be 10" severity FAILURE;
assert (MEMORY_SIZE = 4096) report "MEMORY_SIZE must be 4096" severity FAILURE;

sel_qual <= sel when we = '1' else (others => '0');

memory_0 : RAM512
port map (
CLK => clk,
WE0 => sel_qual(7 downto 0),
EN0 => re or we,
Di0 => din(63 downto 0),
Do0 => obuf(63 downto 0),
A0 => addr(8 downto 0)
);

-- The wishbone BRAM wrapper assumes a 1 cycle delay
memory_read_buffer: process(clk)
begin
if rising_edge(clk) then
dout <= obuf;
end if;
end process;
end architecture behaviour;

@ -1,83 +0,0 @@
#include <stdlib.h>
#include "Vtoplevel.h"
#include "verilated.h"
#include "verilated_vcd_c.h"

/*
* Current simulation time
* This is a 64-bit integer to reduce wrap over issues and
* allow modulus. You can also use a double, if you wish.
*/
vluint64_t main_time = 0;

/*
* Called by $time in Verilog
* converts to double, to match
* what SystemC does
*/
double sc_time_stamp(void)
{
return main_time;
}

#if VM_TRACE
VerilatedVcdC *tfp;
#endif

void tick(Vtoplevel *top)
{
top->ext_clk = 1;
top->eval();
#if VM_TRACE
if (tfp)
tfp->dump((double) main_time);
#endif
main_time++;

top->ext_clk = 0;
top->eval();
#if VM_TRACE
if (tfp)
tfp->dump((double) main_time);
#endif
main_time++;
}

void uart_tx(unsigned char tx);
unsigned char uart_rx(void);

int main(int argc, char **argv)
{
Verilated::commandArgs(argc, argv);

// init top verilog instance
Vtoplevel* top = new Vtoplevel;

#if VM_TRACE
// init trace dump
Verilated::traceEverOn(true);
tfp = new VerilatedVcdC;
top->trace(tfp, 99);
tfp->open("microwatt-verilator.vcd");
#endif

// Reset
top->ext_rst = 0;
for (unsigned long i = 0; i < 5; i++)
tick(top);
top->ext_rst = 1;

while(!Verilated::gotFinish()) {
tick(top);

uart_tx(top->uart0_txd);
top->uart0_rxd = uart_rx();
}

#if VM_TRACE
tfp->close();
delete tfp;
#endif

delete top;
}

@ -1,128 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.common.all;

-- XXX We should be able to make timing with a 2 cycle multiplier
entity multiply is
generic (
PIPELINE_DEPTH : natural := 4
);
port (
clk : in std_logic;

m_in : in MultiplyInputType;
m_out : out MultiplyOutputType
);
end entity multiply;

architecture behaviour of multiply is
signal m: MultiplyInputType := MultiplyInputInit;

type multiply_pipeline_stage is record
valid : std_ulogic;
is_32bit : std_ulogic;
not_res : std_ulogic;
end record;
constant MultiplyPipelineStageInit : multiply_pipeline_stage := (valid => '0',
is_32bit => '0',
not_res => '0');

type multiply_pipeline_type is array(0 to PIPELINE_DEPTH-1) of multiply_pipeline_stage;
constant MultiplyPipelineInit : multiply_pipeline_type := (others => MultiplyPipelineStageInit);

type reg_type is record
multiply_pipeline : multiply_pipeline_type;
end record;

signal r, rin : reg_type := (multiply_pipeline => MultiplyPipelineInit);
signal overflow : std_ulogic;
signal ovf_in : std_ulogic;

signal mult_out : std_logic_vector(127 downto 0);

component multiply_add_64x64 port(
clk : in std_logic;
a : in std_logic_vector(63 downto 0);
b : in std_logic_vector(63 downto 0);
c : in std_logic_vector(127 downto 0);
o : out std_logic_vector(127 downto 0)
);
end component;
begin
multiply_0: process(clk)
begin
if rising_edge(clk) then
m <= m_in;
r <= rin;
overflow <= ovf_in;
end if;
end process;

multiplier : multiply_add_64x64
port map (
clk => clk,
a => m.data1,
b => m.data2,
c => m.addend,
o => mult_out
);

multiply_1: process(all)
variable v : reg_type;
variable d : std_ulogic_vector(127 downto 0);
variable d2 : std_ulogic_vector(63 downto 0);
variable ov : std_ulogic;
begin
v := r;
v.multiply_pipeline(0).valid := m.valid;
v.multiply_pipeline(0).is_32bit := m.is_32bit;
v.multiply_pipeline(0).not_res := m.not_result;

loop_0: for i in 1 to PIPELINE_DEPTH-1 loop
v.multiply_pipeline(i) := r.multiply_pipeline(i-1);
end loop;

if v.multiply_pipeline(PIPELINE_DEPTH-1).not_res = '1' then
d := not mult_out;
else
d := mult_out;
end if;

ov := '0';
if v.multiply_pipeline(PIPELINE_DEPTH-1).is_32bit = '1' then
ov := (or d(63 downto 31)) and not (and d(63 downto 31));
else
ov := (or d(127 downto 63)) and not (and d(127 downto 63));
end if;
ovf_in <= ov;

m_out.result <= d;
m_out.overflow <= overflow;
m_out.valid <= v.multiply_pipeline(PIPELINE_DEPTH-1).valid;

rin <= v;
end process;
end architecture behaviour;


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity short_multiply is
port (
clk : in std_ulogic;

a_in : in std_ulogic_vector(15 downto 0);
b_in : in std_ulogic_vector(15 downto 0);
m_out : out std_ulogic_vector(31 downto 0)
);
end entity short_multiply;

architecture behaviour of short_multiply is
begin
m_out <= std_ulogic_vector(signed(a_in) * signed(b_in));
end architecture behaviour;

@ -1,103 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.common.all;

entity register_file is
generic (
SIM : boolean := false;
HAS_FPU : boolean := true;
LOG_LENGTH : natural := 0
);
port(
clk : in std_logic;

d_in : in Decode2ToRegisterFileType;
d_out : out RegisterFileToDecode2Type;

w_in : in WritebackToRegisterFileType;

dbg_gpr_req : in std_ulogic;
dbg_gpr_ack : out std_ulogic;
dbg_gpr_addr : in gspr_index_t;
dbg_gpr_data : out std_ulogic_vector(63 downto 0);

sim_dump : in std_ulogic;
sim_dump_done : out std_ulogic;

log_out : out std_ulogic_vector(71 downto 0)
);
end entity register_file;

architecture behaviour of register_file is
component Microwatt_FP_DFFRFile port (
CLK : in std_ulogic;

R1 : in std_ulogic_vector(6 downto 0);
R2 : in std_ulogic_vector(6 downto 0);
R3 : in std_ulogic_vector(6 downto 0);

D1 : out std_ulogic_vector(63 downto 0);
D2 : out std_ulogic_vector(63 downto 0);
D3 : out std_ulogic_vector(63 downto 0);

WE : in std_ulogic;
RW : in std_ulogic_vector(6 downto 0);
DW : in std_ulogic_vector(63 downto 0)
);
end component;

signal d1: std_ulogic_vector(63 downto 0);
signal d2: std_ulogic_vector(63 downto 0);
signal d3: std_ulogic_vector(63 downto 0);
begin

register_file_0 : Microwatt_FP_DFFRFile
port map (
CLK => clk,

R1 => d_in.read1_reg,
R2 => d_in.read2_reg,
R3 => d_in.read3_reg,

D1 => d1,
D2 => d2,
D3 => d3,

WE => w_in.write_enable,
RW => w_in.write_reg,
DW => w_in.write_data
);

x_state_check: process(clk)
begin
if rising_edge(clk) then
if w_in.write_enable = '1' then
assert not(is_x(w_in.write_data)) and not(is_x(w_in.write_reg)) severity failure;
end if;
end if;
end process x_state_check;

-- Forward any written data
register_read_0: process(all)
begin
d_out.read1_data <= d1;
d_out.read2_data <= d2;
d_out.read3_data <= d3;

if w_in.write_enable = '1' then
if d_in.read1_reg = w_in.write_reg then
d_out.read1_data <= w_in.write_data;
end if;
if d_in.read2_reg = w_in.write_reg then
d_out.read2_data <= w_in.write_data;
end if;
if d_in.read3_reg = w_in.write_reg then
d_out.read3_data <= w_in.write_data;
end if;
end if;
end process register_read_0;

end architecture behaviour;

@ -1,151 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.wishbone_types.all;

entity toplevel is
generic (
MEMORY_SIZE : integer := 8192;
RAM_INIT_FILE : string := "firmware.hex";
RESET_LOW : boolean := true;
CLK_INPUT : positive := 100000000;
CLK_FREQUENCY : positive := 100000000;
HAS_FPU : boolean := false;
HAS_BTC : boolean := false;
NO_BRAM : boolean := false;
DISABLE_FLATTEN_CORE : boolean := false;
ALT_RESET_ADDRESS : std_logic_vector(63 downto 0) := (27 downto 0 => '0', others => '1');
SPI_FLASH_OFFSET : integer := 0;
SPI_FLASH_DEF_CKDV : natural := 4;
SPI_FLASH_DEF_QUAD : boolean := false;
SPI_BOOT_CLOCKS : boolean := false;
LOG_LENGTH : natural := 0;
UART_IS_16550 : boolean := true;
HAS_UART1 : boolean := false;
HAS_JTAG : boolean := true;
ICACHE_NUM_LINES : natural := 4;
ICACHE_NUM_WAYS : natural := 1;
ICACHE_TLB_SIZE : natural := 4;
DCACHE_NUM_LINES : natural := 4;
DCACHE_NUM_WAYS : natural := 1;
DCACHE_TLB_SET_SIZE : natural := 2;
DCACHE_TLB_NUM_WAYS : natural := 2;
HAS_GPIO : boolean := true;
NGPIO : natural := 32
);
port(
ext_clk : in std_ulogic;
ext_rst : in std_ulogic;

-- UART0 signals:
uart0_txd : out std_ulogic;
uart0_rxd : in std_ulogic;

-- UART1 signals:
uart1_txd : out std_ulogic;
uart1_rxd : in std_ulogic;

-- SPI
spi_flash_cs_n : out std_ulogic;
spi_flash_clk : out std_ulogic;
spi_flash_sdat_i : in std_ulogic_vector(3 downto 0);
spi_flash_sdat_o : out std_ulogic_vector(3 downto 0);
spi_flash_sdat_oe : out std_ulogic_vector(3 downto 0);

-- JTAG signals:
jtag_tck : in std_ulogic;
jtag_tdi : in std_ulogic;
jtag_tms : in std_ulogic;
jtag_trst : in std_ulogic;
jtag_tdo : out std_ulogic;

-- GPIO
gpio_in : in std_ulogic_vector(NGPIO - 1 downto 0);
gpio_out : out std_ulogic_vector(NGPIO - 1 downto 0);
gpio_dir : out std_ulogic_vector(NGPIO - 1 downto 0);

-- Add an I/O pin to select fetching from flash on reset
alt_reset : in std_ulogic
);
end entity toplevel;

architecture behaviour of toplevel is
-- reset signals
signal system_rst : std_ulogic;
begin

system_rst <= not ext_rst when RESET_LOW else ext_rst;

-- Main SoC
soc0: entity work.soc
generic map(
MEMORY_SIZE => MEMORY_SIZE,
RAM_INIT_FILE => RAM_INIT_FILE,
SIM => false,
CLK_FREQ => CLK_FREQUENCY,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_DRAM => false,
DRAM_SIZE => 0,
DRAM_INIT_SIZE => 0,
DISABLE_FLATTEN_CORE => DISABLE_FLATTEN_CORE,
ALT_RESET_ADDRESS => ALT_RESET_ADDRESS,
HAS_SPI_FLASH => true,
SPI_FLASH_DLINES => 4,
SPI_FLASH_OFFSET => SPI_FLASH_OFFSET,
SPI_FLASH_DEF_CKDV => SPI_FLASH_DEF_CKDV,
SPI_FLASH_DEF_QUAD => SPI_FLASH_DEF_QUAD,
SPI_BOOT_CLOCKS => SPI_BOOT_CLOCKS,
LOG_LENGTH => LOG_LENGTH,
UART0_IS_16550 => UART_IS_16550,
HAS_UART1 => HAS_UART1,
HAS_JTAG => HAS_JTAG,
HAS_GPIO => HAS_GPIO,
NGPIO => NGPIO,
ICACHE_NUM_LINES => ICACHE_NUM_LINES,
ICACHE_NUM_WAYS => ICACHE_NUM_WAYS,
ICACHE_TLB_SIZE => ICACHE_TLB_SIZE,
DCACHE_NUM_LINES => DCACHE_NUM_LINES,
DCACHE_NUM_WAYS => DCACHE_NUM_WAYS,
DCACHE_TLB_SET_SIZE => DCACHE_TLB_SET_SIZE,
DCACHE_TLB_NUM_WAYS => DCACHE_TLB_NUM_WAYS
)
port map (
-- System signals
system_clk => ext_clk,
rst => system_rst,

-- UART signals
uart0_txd => uart0_txd,
uart0_rxd => uart0_rxd,

-- UART1 signals
uart1_txd => uart1_txd,
uart1_rxd => uart1_rxd,

-- SPI signals
spi_flash_sck => spi_flash_clk,
spi_flash_cs_n => spi_flash_cs_n,
spi_flash_sdat_o => spi_flash_sdat_o,
spi_flash_sdat_oe => spi_flash_sdat_oe,
spi_flash_sdat_i => spi_flash_sdat_i,

-- JTAG signals
jtag_tck => jtag_tck,
jtag_tdi => jtag_tdi,
jtag_tms => jtag_tms,
jtag_trst => jtag_trst,
jtag_tdo => jtag_tdo,

-- GPIO signals
gpio_in => gpio_in,
gpio_out => gpio_out,
gpio_dir => gpio_dir,

-- Reset PC to flash offset 0 (ie 0xf000000)
alt_reset => alt_reset
);

end architecture behaviour;

@ -1,71 +0,0 @@
#!/usr/bin/python

import argparse
import re

module_regex = r'[a-zA-Z0-9_:\.\\]+'

# match:
# module dcache(clk, rst, d_in, m_in, wishbone_in, d_out, m_out, stall_out, wishbone_out);
# A bit of a hack - ignore anything contining a '`', and assume that means we've already
# processed this module in a previous run. This helps when having to run this script
# multiple times for different power names.
multiline_module_re = re.compile(r'module\s+(' + module_regex + r')\(([^`]*?)\);', re.DOTALL)
module_re = re.compile(r'module\s+(' + module_regex + r')\((.*?)\);')

# match:
# dcache_64_2_2_2_2_12_0 dcache_0 (
hookup_re = re.compile(r'\s+(' + module_regex + r') ' + module_regex + r'\s+\(')

header1 = """\
`ifdef USE_POWER_PINS
{power}, {ground}, `endif\
"""

header2 = """\
`ifdef USE_POWER_PINS
inout {power};
inout {ground};
`endif\
"""

header3 = """\
`ifdef USE_POWER_PINS
.{power}({parent_power}),
.{ground}({parent_ground}),
`endif\
"""

parser = argparse.ArgumentParser(description='Insert power and ground into verilog modules')
parser.add_argument('--power', default='VPWR', help='POWER net name (default VPWR)')
parser.add_argument('--ground', default='VGND', help='POWER net name (default VGND)')
parser.add_argument('--parent-power', default='VPWR', help='POWER net name of parent module (default VPWR)')
parser.add_argument('--parent-ground', default='VGND', help='POWER net name of parent module (default VGND)')
parser.add_argument('--verilog', required=True, help='Verilog file to modify')
parser.add_argument('--module', required=True, action='append', help='Module to replace (can be specified multiple times')

args = parser.parse_args()

with open(args.verilog, 'r') as f:
d = f.read()
# Remove newlines from module definitions, yosys started doing this as of
# commit ff8e999a7112 ("Split module ports, 20 per line")
fixed = multiline_module_re.sub(lambda m: m.group(0).replace("\n", ""), d)

for line in fixed.splitlines():
m = module_re.match(line)
m2 = hookup_re.match(line)
if m and m.group(1) in args.module:
module_name = m.group(1)
module_args = m.group(2)
print('module %s(' % (module_name))
print("")
print(header1.format(power=args.power, ground=args.ground))
print(' %s);' % module_args)
print(header2.format(power=args.power, ground=args.ground))
elif m2 and m2.group(1) in args.module:
print(line)
print(header3.format(parent_power=args.parent_power, parent_ground=args.parent_ground, power=args.power, ground=args.ground))
else:
print(line)

@ -1,36 +0,0 @@
#!/bin/bash -e

# process microwatt verilog

FILE_IN=microwatt_asic.v
FILE_OUT=microwatt_asic_processed.v

# Rename top level
sed 's/toplevel/microwatt/' < $FILE_IN > $FILE_OUT

# Add power to all macros, and route power in microwatt down to them
caravel/insert_power.py --verilog=$FILE_OUT --parent-power=vccd1 --parent-ground=vssd1 --power=vccd1 --ground=vssd1 --module=microwatt --module=core_0_4_1_4_4_1_2_2_92a0c5f888148ab9da14068b971849437301064b --module=execute1_0_9508e90548b0440a4a61e5743b76c1e309b23b7f --module=multiply_4 --module=soc_4096_100000000_0_0_4_0_4_0_4_1_4_4_1_2_2_32_ed0b68172790179612c5bea419d574732b13cc2a --module=icache_64_8_4_1_4_12_0_5ba93c9db0cff93f52b521d7420e43f6eda2784f --module=dcache_64_4_1_2_2_12_0 --module=cache_ram_5_64_1489f923c4dca729178b3e3233458550d8dddf29 --module=main_bram_64_10_4096_a75adb9e07879fb6c63b494abe06e3f9a6bb2ed9 --module=register_file_0_1489f923c4dca729178b3e3233458550d8dddf29 --module=wishbone_bram_wrapper_4096_a75adb9e07879fb6c63b494abe06e3f9a6bb2ed9 > ${FILE_OUT}.tmp1

# Hard macros use VPWR/VGND
caravel/insert_power.py --verilog=${FILE_OUT}.tmp1 --parent-power=vccd1 --parent-ground=vssd1 --power=VPWR --ground=VGND --module=Microwatt_FP_DFFRFile --module=multiply_add_64x64 --module=RAM32_1RW1R --module=RAM512 > ${FILE_OUT}.tmp2

mv ${FILE_OUT}.tmp2 ${FILE_OUT}
rm ${FILE_OUT}.tmp1

# Add defines
sed -i '1 a\
\
/* JTAG */\
`include "tap_top.v"\
\
/* UART */\
`include "raminfr.v"\
`include "uart_receiver.v"\
`include "uart_rfifo.v"\
`include "uart_tfifo.v"\
`include "uart_transmitter.v"\
`include "uart_defines.v"\
`include "uart_regs.v"\
`include "uart_sync_flops.v"\
`include "uart_wb.v"\
`include "uart_top.v"' $FILE_OUT

@ -8,7 +8,7 @@ use work.decode_types.all;

package common is
-- Processor Version Number
constant PVR_MICROWATT : std_ulogic_vector(31 downto 0) := x"00630101";
constant PVR_MICROWATT : std_ulogic_vector(31 downto 0) := x"00630000";

-- MSR bit numbers
constant MSR_SF : integer := (63 - 0); -- Sixty-Four bit mode
@ -200,7 +200,6 @@ package common is
priv_mode : std_ulogic;
big_endian : std_ulogic;
stop_mark: std_ulogic;
sequential: std_ulogic;
predicted : std_ulogic;
pred_ntaken : std_ulogic;
nia: std_ulogic_vector(63 downto 0);

@ -64,8 +64,8 @@ architecture rtl of control is

signal r_int, rin_int : reg_internal_type := reg_internal_init;

signal gpr_write_valid : std_ulogic := '0';
signal cr_write_valid : std_ulogic := '0';
signal gpr_write_valid : std_ulogic;
signal cr_write_valid : std_ulogic;

type tag_register is record
wr_gpr : std_ulogic;
@ -245,6 +245,8 @@ begin
end if;

if rst = '1' then
gpr_write_valid <= '0';
cr_write_valid <= '0';
v_int := reg_internal_init;
valid_tmp := '0';
end if;

@ -117,21 +117,20 @@ architecture behave of core is
signal complete: instr_tag_t;
signal terminate: std_ulogic;
signal core_rst: std_ulogic;
signal icache_inv: std_ulogic;
signal do_interrupt: std_ulogic;

-- Delayed/Latched resets and alt_reset
signal rst_fetch1 : std_ulogic := '1';
signal rst_fetch2 : std_ulogic := '1';
signal rst_icache : std_ulogic := '1';
signal rst_dcache : std_ulogic := '1';
signal rst_dec1 : std_ulogic := '1';
signal rst_dec2 : std_ulogic := '1';
signal rst_ex1 : std_ulogic := '1';
signal rst_fpu : std_ulogic := '1';
signal rst_ls1 : std_ulogic := '1';
signal rst_wback : std_ulogic := '1';
signal rst_dbg : std_ulogic := '1';
signal rst_fetch1 : std_ulogic;
signal rst_fetch2 : std_ulogic;
signal rst_icache : std_ulogic;
signal rst_dcache : std_ulogic;
signal rst_dec1 : std_ulogic;
signal rst_dec2 : std_ulogic;
signal rst_ex1 : std_ulogic;
signal rst_fpu : std_ulogic;
signal rst_ls1 : std_ulogic;
signal rst_wback : std_ulogic;
signal rst_dbg : std_ulogic;
signal alt_reset_d : std_ulogic;

signal sim_cr_dump: std_ulogic;

@ -154,6 +154,7 @@ begin
stopping <= '0';
terminated <= '0';
log_trigger_delay <= 0;
gspr_index <= (others => '0');
else
if do_log_trigger = '1' or log_trigger_delay /= 0 then
if log_trigger_delay = 255 then

@ -121,6 +121,7 @@ begin
DRAM_ABITS => 24,
DRAM_ALINES => 1,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => DRAM_INIT_FILE,
PAYLOAD_SIZE => ROM_SIZE

@ -0,0 +1,136 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.helpers.all;

entity bit_counter is
port (
clk : in std_logic;
rs : in std_ulogic_vector(63 downto 0);
count_right : in std_ulogic;
do_popcnt : in std_ulogic;
is_32bit : in std_ulogic;
datalen : in std_ulogic_vector(3 downto 0);
result : out std_ulogic_vector(63 downto 0)
);
end entity bit_counter;

architecture behaviour of bit_counter is
-- signals for count-leading/trailing-zeroes
signal inp : std_ulogic_vector(63 downto 0);
signal inp_r : std_ulogic_vector(63 downto 0);
signal sum : std_ulogic_vector(64 downto 0);
signal sum_r : std_ulogic_vector(64 downto 0);
signal onehot : std_ulogic_vector(63 downto 0);
signal edge : std_ulogic_vector(63 downto 0);
signal bitnum : std_ulogic_vector(5 downto 0);
signal cntz : std_ulogic_vector(63 downto 0);

-- signals for popcnt
signal dlen_r : std_ulogic_vector(3 downto 0);
signal pcnt_r : std_ulogic;
subtype twobit is unsigned(1 downto 0);
type twobit32 is array(0 to 31) of twobit;
signal pc2 : twobit32;
subtype threebit is unsigned(2 downto 0);
type threebit16 is array(0 to 15) of threebit;
signal pc4 : threebit16;
subtype fourbit is unsigned(3 downto 0);
type fourbit8 is array(0 to 7) of fourbit;
signal pc8 : fourbit8;
signal pc8_r : fourbit8;
subtype sixbit is unsigned(5 downto 0);
type sixbit2 is array(0 to 1) of sixbit;
signal pc32 : sixbit2;
signal popcnt : std_ulogic_vector(63 downto 0);

begin
countzero_r: process(clk)
begin
if rising_edge(clk) then
inp_r <= inp;
sum_r <= sum;
end if;
end process;

countzero: process(all)
variable bitnum_e, bitnum_o : std_ulogic_vector(5 downto 0);
begin
if is_32bit = '0' then
if count_right = '0' then
inp <= bit_reverse(rs);
else
inp <= rs;
end if;
else
inp(63 downto 32) <= x"FFFFFFFF";
if count_right = '0' then
inp(31 downto 0) <= bit_reverse(rs(31 downto 0));
else
inp(31 downto 0) <= rs(31 downto 0);
end if;
end if;

sum <= std_ulogic_vector(unsigned('0' & not inp) + 1);

-- The following occurs after a clock edge
edge <= sum_r(63 downto 0) or inp_r;
bitnum_e := edgelocation(edge, 6);
onehot <= sum_r(63 downto 0) and inp_r;
bitnum_o := bit_number(onehot);
bitnum(5 downto 2) <= bitnum_e(5 downto 2);
bitnum(1 downto 0) <= bitnum_o(1 downto 0);

cntz <= 57x"0" & sum_r(64) & bitnum;
end process;

popcnt_r: process(clk)
begin
if rising_edge(clk) then
for i in 0 to 7 loop
pc8_r(i) <= pc8(i);
end loop;
dlen_r <= datalen;
pcnt_r <= do_popcnt;
end if;
end process;

popcnt_a: process(all)
begin
for i in 0 to 31 loop
pc2(i) <= unsigned("0" & rs(i * 2 downto i * 2)) + unsigned("0" & rs(i * 2 + 1 downto i * 2 + 1));
end loop;
for i in 0 to 15 loop
pc4(i) <= ('0' & pc2(i * 2)) + ('0' & pc2(i * 2 + 1));
end loop;
for i in 0 to 7 loop
pc8(i) <= ('0' & pc4(i * 2)) + ('0' & pc4(i * 2 + 1));
end loop;

-- after a clock edge
for i in 0 to 1 loop
pc32(i) <= ("00" & pc8_r(i * 4)) + ("00" & pc8_r(i * 4 + 1)) +
("00" & pc8_r(i * 4 + 2)) + ("00" & pc8_r(i * 4 + 3));
end loop;
popcnt <= (others => '0');
if dlen_r(3 downto 2) = "00" then
-- popcntb
for i in 0 to 7 loop
popcnt(i * 8 + 3 downto i * 8) <= std_ulogic_vector(pc8_r(i));
end loop;
elsif dlen_r(3) = '0' then
-- popcntw
for i in 0 to 1 loop
popcnt(i * 32 + 5 downto i * 32) <= std_ulogic_vector(pc32(i));
end loop;
else
popcnt(6 downto 0) <= std_ulogic_vector(('0' & pc32(0)) + ('0' & pc32(1)));
end if;
end process;

result <= cntz when pcnt_r = '0' else popcnt;

end behaviour;

@ -11,11 +11,11 @@ use work.common.all;
library osvvm;
use osvvm.RandomPkg.all;

entity countzero_tb is
entity countbits_tb is
generic (runner_cfg : string := runner_cfg_default);
end countzero_tb;
end countbits_tb;

architecture behave of countzero_tb is
architecture behave of countbits_tb is
constant clk_period: time := 10 ns;
signal rs: std_ulogic_vector(63 downto 0);
signal is_32bit, count_right: std_ulogic := '0';
@ -23,13 +23,15 @@ architecture behave of countzero_tb is
signal clk: std_ulogic;

begin
zerocounter_0: entity work.zero_counter
bitcounter_0: entity work.bit_counter
port map (
clk => clk,
rs => rs,
result => res,
count_right => count_right,
is_32bit => is_32bit
is_32bit => is_32bit,
do_popcnt => '0',
datalen => "0000"
);

clk_process: process

@ -1,60 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.helpers.all;

entity zero_counter is
port (
clk : in std_logic;
rs : in std_ulogic_vector(63 downto 0);
count_right : in std_ulogic;
is_32bit : in std_ulogic;
result : out std_ulogic_vector(63 downto 0)
);
end entity zero_counter;

architecture behaviour of zero_counter is
signal inp : std_ulogic_vector(63 downto 0);
signal sum : std_ulogic_vector(64 downto 0);
signal msb_r : std_ulogic;
signal onehot : std_ulogic_vector(63 downto 0);
signal onehot_r : std_ulogic_vector(63 downto 0);
signal bitnum : std_ulogic_vector(5 downto 0);

begin
countzero_r: process(clk)
begin
if rising_edge(clk) then
msb_r <= sum(64);
onehot_r <= onehot;
end if;
end process;

countzero: process(all)
begin
if is_32bit = '0' then
if count_right = '0' then
inp <= bit_reverse(rs);
else
inp <= rs;
end if;
else
inp(63 downto 32) <= x"FFFFFFFF";
if count_right = '0' then
inp(31 downto 0) <= bit_reverse(rs(31 downto 0));
else
inp(31 downto 0) <= rs(31 downto 0);
end if;
end if;

sum <= std_ulogic_vector(unsigned('0' & not inp) + 1);
onehot <= sum(63 downto 0) and inp;

-- The following occurs after a clock edge
bitnum <= bit_number(onehot_r);

result <= x"00000000000000" & "0" & msb_r & bitnum;
end process;
end behaviour;

@ -1121,7 +1121,6 @@ begin
rams: for i in 0 to NUM_WAYS-1 generate
signal do_read : std_ulogic;
signal rd_addr : std_ulogic_vector(ROW_BITS-1 downto 0);
signal do_write : std_ulogic;
signal wr_addr : std_ulogic_vector(ROW_BITS-1 downto 0);
signal wr_data : std_ulogic_vector(wishbone_data_bits-1 downto 0);
signal wr_sel : std_ulogic_vector(ROW_SIZE-1 downto 0);

@ -215,7 +215,6 @@ architecture behaviour of decode2 is
OP_AND => "001", -- logical_result
OP_OR => "001",
OP_XOR => "001",
OP_POPCNT => "001",
OP_PRTY => "001",
OP_CMPB => "001",
OP_EXTS => "001",
@ -234,7 +233,8 @@ architecture behaviour of decode2 is
OP_DIV => "011",
OP_DIVE => "011",
OP_MOD => "011",
OP_CNTZ => "100", -- countzero_result
OP_CNTZ => "100", -- countbits_result
OP_POPCNT => "100",
OP_MFSPR => "101", -- spr_result
OP_B => "110", -- next_nia
OP_BC => "110",

@ -42,6 +42,8 @@ begin
quot <= (others => '0');
running <= '0';
count <= "0000000";
is_32bit <= '0';
overflow <= '0';
elsif d_in.valid = '1' then
if d_in.is_extended = '1' then
dend <= '0' & d_in.dividend & x"0000000000000000";

@ -0,0 +1,298 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.math_real.all;

library work;
use work.wishbone_types.all;

entity dmi_dtm is
generic(ABITS : INTEGER:=8;
DBITS : INTEGER:=64);

port(sys_clk : in std_ulogic;
sys_reset : in std_ulogic;
dmi_addr : out std_ulogic_vector(ABITS - 1 downto 0);
dmi_din : in std_ulogic_vector(DBITS - 1 downto 0);
dmi_dout : out std_ulogic_vector(DBITS - 1 downto 0);
dmi_req : out std_ulogic;
dmi_wr : out std_ulogic;
dmi_ack : in std_ulogic
-- dmi_err : in std_ulogic TODO: Add error response
);
end entity dmi_dtm;

architecture behaviour of dmi_dtm is
-- Signals coming out of the JTAGG block
signal jtag_reset_n : std_ulogic;
signal tdi : std_ulogic;
signal tdo : std_ulogic;
signal tck : std_ulogic;
signal jce1 : std_ulogic;
signal jshift : std_ulogic;
signal update : std_ulogic;

-- signals to match dmi_dtb_xilinx
signal jtag_reset : std_ulogic;
signal capture : std_ulogic;
signal jtag_clk : std_ulogic;
signal sel : std_ulogic;
signal shift : std_ulogic;

-- delays
signal jce1_d : std_ulogic;
constant TCK_DELAY : INTEGER := 8;
signal tck_d : std_ulogic_vector(TCK_DELAY+1 downto 1);

-- ** JTAG clock domain **

-- Shift register
signal shiftr : std_ulogic_vector(ABITS + DBITS + 1 downto 0);

-- Latched request
signal request : std_ulogic_vector(ABITS + DBITS + 1 downto 0);

-- A request is present
signal jtag_req : std_ulogic;

-- Synchronizer for jtag_rsp (sys clk -> jtag_clk)
signal dmi_ack_0 : std_ulogic;
signal dmi_ack_1 : std_ulogic;

-- ** sys clock domain **

-- Synchronizer for jtag_req (jtag clk -> sys clk)
signal jtag_req_0 : std_ulogic;
signal jtag_req_1 : std_ulogic;

-- ** combination signals
signal jtag_bsy : std_ulogic;
signal op_valid : std_ulogic;
signal rsp_op : std_ulogic_vector(1 downto 0);

-- ** Constants **
constant DMI_REQ_NOP : std_ulogic_vector(1 downto 0) := "00";
constant DMI_REQ_RD : std_ulogic_vector(1 downto 0) := "01";
constant DMI_REQ_WR : std_ulogic_vector(1 downto 0) := "10";
constant DMI_RSP_OK : std_ulogic_vector(1 downto 0) := "00";
constant DMI_RSP_BSY : std_ulogic_vector(1 downto 0) := "11";

attribute ASYNC_REG : string;
attribute ASYNC_REG of jtag_req_0: signal is "TRUE";
attribute ASYNC_REG of jtag_req_1: signal is "TRUE";
attribute ASYNC_REG of dmi_ack_0: signal is "TRUE";
attribute ASYNC_REG of dmi_ack_1: signal is "TRUE";

-- ECP5 JTAGG
component JTAGG is
generic (
ER1 : string := "ENABLED";
ER2 : string := "ENABLED"
);
port(
JTDO1 : in std_ulogic;
JTDO2 : in std_ulogic;
JTDI : out std_ulogic;
JTCK : out std_ulogic;
JRTI1 : out std_ulogic;
JRTI2 : out std_ulogic;
JSHIFT : out std_ulogic;
JUPDATE : out std_ulogic;
JRSTN : out std_ulogic;
JCE1 : out std_ulogic;
JCE2 : out std_ulogic
);
end component;

component LUT4 is
generic (
INIT : std_logic_vector
);
port(
A : in STD_ULOGIC;
B : in STD_ULOGIC;
C : in STD_ULOGIC;
D : in STD_ULOGIC;
Z : out STD_ULOGIC
);
end component;

begin

jtag: JTAGG
generic map(
ER2 => "DISABLED"
)
port map (
JTDO1 => tdo,
JTDO2 => '0',
JTDI => tdi,
JTCK => tck,
JRTI1 => open,
JRTI2 => open,
JSHIFT => jshift,
JUPDATE => update,
JRSTN => jtag_reset_n,
JCE1 => jce1,
JCE2 => open
);

-- JRTI1 looks like it could be connected to SEL, but
-- in practise JRTI1 is only high briefly, not for the duration
-- of the transmission. possibly mw_debug could be modified.
-- The ecp5 is probably the only jtag device anyway.
sel <= '1';

-- TDI needs to align with TCK, we use LUT delays here.
-- From https://github.com/enjoy-digital/litex/pull/1087
tck_d(1) <= tck;
del: for i in 1 to TCK_DELAY generate
attribute keep : boolean;
attribute keep of l: label is true;
begin
l: LUT4
generic map(
INIT => b"0000_0000_0000_0010"
)
port map (
A => tck_d(i),
B => '0', C => '0', D => '0',
Z => tck_d(i+1)
);
end generate;
jtag_clk <= tck_d(TCK_DELAY+1);

-- capture signal
jce1_sync : process(jtag_clk)
begin
if rising_edge(jtag_clk) then
jce1_d <= jce1;
capture <= jce1 and not jce1_d;
end if;
end process;

-- latch the shift signal, otherwise
-- we miss the last shift in
-- (maybe because we are delaying tck?)
shift_sync : process(jtag_clk)
begin
if (sys_reset = '1') then
shift <= '0';
elsif rising_edge(jtag_clk) then
shift <= jshift;
end if;
end process;

jtag_reset <= not jtag_reset_n;

-- dmi_req synchronization
dmi_req_sync : process(sys_clk)
begin
-- sys_reset is synchronous
if rising_edge(sys_clk) then
if (sys_reset = '1') then
jtag_req_0 <= '0';
jtag_req_1 <= '0';
else
jtag_req_0 <= jtag_req;
jtag_req_1 <= jtag_req_0;
end if;
end if;
end process;
dmi_req <= jtag_req_1;

-- dmi_ack synchronization
dmi_ack_sync: process(jtag_clk, jtag_reset)
begin
-- jtag_reset is async (see comments)
if jtag_reset = '1' then
dmi_ack_0 <= '0';
dmi_ack_1 <= '0';
elsif rising_edge(jtag_clk) then
dmi_ack_0 <= dmi_ack;
dmi_ack_1 <= dmi_ack_0;
end if;
end process;
-- jtag_bsy indicates whether we can start a new request, we can when
-- we aren't already processing one (jtag_req) and the synchronized ack
-- of the previous one is 0.
--
jtag_bsy <= jtag_req or dmi_ack_1;

-- decode request type in shift register
with shiftr(1 downto 0) select op_valid <=
'1' when DMI_REQ_RD,
'1' when DMI_REQ_WR,
'0' when others;

-- encode response op
rsp_op <= DMI_RSP_BSY when jtag_bsy = '1' else DMI_RSP_OK;

-- Some DMI out signals are directly driven from the request register
dmi_addr <= request(ABITS + DBITS + 1 downto DBITS + 2);
dmi_dout <= request(DBITS + 1 downto 2);
dmi_wr <= '1' when request(1 downto 0) = DMI_REQ_WR else '0';

-- TDO is wired to shift register bit 0
tdo <= shiftr(0);

-- Main state machine. Handles shift registers, request latch and
-- jtag_req latch. Could be split into 3 processes but it's probably
-- not worthwhile.
--
shifter: process(jtag_clk, jtag_reset, sys_reset)
begin
if jtag_reset = '1' or sys_reset = '1' then
shiftr <= (others => '0');
jtag_req <= '0';
request <= (others => '0');
elsif rising_edge(jtag_clk) then

-- Handle jtag "commands" when sel is 1
if sel = '1' then
-- Shift state, rotate the register
if shift = '1' then
shiftr <= tdi & shiftr(ABITS + DBITS + 1 downto 1);
end if;

-- Update state (trigger)
--
-- Latch the request if we aren't already processing one and
-- it has a valid command opcode.
--
if update = '1' and op_valid = '1' then
if jtag_bsy = '0' then
request <= shiftr;
jtag_req <= '1';
end if;
-- Set the shift register "op" to "busy". This will prevent
-- us from re-starting the command on the next update if
-- the command completes before that.
shiftr(1 downto 0) <= DMI_RSP_BSY;
end if;

-- Request completion.
--
-- Capture the response data for reads and clear request flag.
--
-- Note: We clear req (and thus dmi_req) here which relies on tck
-- ticking and sel set. This means we are stuck with dmi_req up if
-- the jtag interface stops. Slaves must be resilient to this.
--
if jtag_req = '1' and dmi_ack_1 = '1' then
jtag_req <= '0';
if request(1 downto 0) = DMI_REQ_RD then
request(DBITS + 1 downto 2) <= dmi_din;
end if;
end if;

-- Capture state, grab latch content with updated status
if capture = '1' then
shiftr <= request(ABITS + DBITS + 1 downto 2) & rsp_op;
end if;

end if;
end if;
end process;
end architecture behaviour;

@ -1,302 +0,0 @@
-- JTAG to DMI interface, based on the Xilinx version
--
-- DMI bus
--
-- req : ____/------------\_____
-- addr: xxxx< >xxxxx, based on the Xilinx version
-- dout: xxxx< >xxxxx
-- wr : xxxx< >xxxxx
-- din : xxxxxxxxxxxx< >xxx
-- ack : ____________/------\___
--
-- * addr/dout set along with req, can be latched on same cycle by slave
-- * ack & din remain up until req is dropped by master, the slave must
-- provide a stable output on din on reads during that time.
-- * req remains low at until at least one sysclk after ack seen down.
--
-- JTAG (tck) DMI (sys_clk)
--
-- * jtag_req = 1
-- (jtag_req_0) *
-- (jtag_req_1) -> * dmi_req = 1 >
-- *.../...
-- * dmi_ack = 1 <
-- * (dmi_ack_0)
-- * <- (dmi_ack_1)
-- * jtag_req = 0 (and latch dmi_din)
-- (jtag_req_0) *
-- (jtag_req_1) -> * dmi_req = 0 >
-- * dmi_ack = 0 <
-- * (dmi_ack_0)
-- * <- (dmi_ack_1)
--
-- jtag_req can go back to 1 when jtag_rsp_1 is 0
--
-- Questions/TODO:
-- - I use 2 flip fops for sync, is that enough ?
-- - I treat the jtag_trst as an async reset, is that necessary ?
-- - Dbl check reset situation since we have two different resets
-- each only resetting part of the logic...
-- - Look at optionally removing the synchronizer on the ack path,
-- assuming JTAG is always slow enough that ack will have been
-- stable long enough by the time CAPTURE comes in.
-- - We could avoid the latched request by not shifting while a
-- request is in progress (and force TDO to 1 to return a busy
-- status).
--
-- WARNING: This isn't the real DMI JTAG protocol (at least not yet).
-- a command while busy will be ignored. A response of "11"
-- means the previous command is still going, try again.
-- As such We don't implement the DMI "error" status, and
-- we don't implement DTMCS yet... This may still all change
-- but for now it's easier that way as the real DMI protocol
-- requires for a command to work properly that enough TCK
-- are sent while IDLE and I'm having trouble getting that
-- working with UrJtag and the Xilinx BSCAN2 for now.

library ieee;
use ieee.std_logic_1164.all;
use ieee.math_real.all;

library work;
use work.wishbone_types.all;

entity dmi_dtm_jtag is
generic(ABITS : INTEGER:=8;
DBITS : INTEGER:=32);

port(sys_clk : in std_ulogic;
sys_reset : in std_ulogic;
dmi_addr : out std_ulogic_vector(ABITS - 1 downto 0);
dmi_din : in std_ulogic_vector(DBITS - 1 downto 0);
dmi_dout : out std_ulogic_vector(DBITS - 1 downto 0);
dmi_req : out std_ulogic;
dmi_wr : out std_ulogic;
dmi_ack : in std_ulogic;
-- dmi_err : in std_ulogic TODO: Add error response
jtag_tck : in std_ulogic;
jtag_tdi : in std_ulogic;
jtag_tms : in std_ulogic;
jtag_trst : in std_ulogic;
jtag_tdo : out std_ulogic
);
end entity dmi_dtm_jtag;

architecture behaviour of dmi_dtm_jtag is

-- Signals coming out of the JTAG TAP controller
signal capture : std_ulogic;
signal update : std_ulogic;
signal sel : std_ulogic;
signal shift : std_ulogic;
signal tdi : std_ulogic;
signal tdo : std_ulogic;

-- ** JTAG clock domain **

-- Shift register
signal shiftr : std_ulogic_vector(ABITS + DBITS + 1 downto 0);

-- Latched request
signal request : std_ulogic_vector(ABITS + DBITS + 1 downto 0);

-- A request is present
signal jtag_req : std_ulogic;

-- Synchronizer for jtag_rsp (sys clk -> jtag_tck)
signal dmi_ack_0 : std_ulogic;
signal dmi_ack_1 : std_ulogic;

-- ** sys clock domain **

-- Synchronizer for jtag_req (jtag clk -> sys clk)
signal jtag_req_0 : std_ulogic;
signal jtag_req_1 : std_ulogic;

-- ** combination signals
signal jtag_bsy : std_ulogic;
signal op_valid : std_ulogic;
signal rsp_op : std_ulogic_vector(1 downto 0);

-- ** Constants **
constant DMI_REQ_NOP : std_ulogic_vector(1 downto 0) := "00";
constant DMI_REQ_RD : std_ulogic_vector(1 downto 0) := "01";
constant DMI_REQ_WR : std_ulogic_vector(1 downto 0) := "10";
constant DMI_RSP_OK : std_ulogic_vector(1 downto 0) := "00";
constant DMI_RSP_BSY : std_ulogic_vector(1 downto 0) := "11";

attribute ASYNC_REG : string;
attribute ASYNC_REG of jtag_req_0: signal is "TRUE";
attribute ASYNC_REG of jtag_req_1: signal is "TRUE";
attribute ASYNC_REG of dmi_ack_0: signal is "TRUE";
attribute ASYNC_REG of dmi_ack_1: signal is "TRUE";

component tap_top port (
-- JTAG pads
tms_pad_i : in std_ulogic;
tck_pad_i : in std_ulogic;
trst_pad_i : in std_ulogic;
tdi_pad_i : in std_ulogic;
tdo_pad_o : out std_ulogic;
tdo_padoe_o : out std_ulogic;

-- TAP states
shift_dr_o : out std_ulogic;
pause_dr_o : out std_ulogic;
update_dr_o : out std_ulogic;
capture_dr_o : out std_ulogic;

-- Select signals for boundary scan or mbist
extest_select_o : out std_ulogic;
sample_preload_select_o : out std_ulogic;
mbist_select_o : out std_ulogic;
debug_select_o : out std_ulogic;

-- TDO signal that is connected to TDI of sub-modules.
tdo_o : out std_ulogic;

-- TDI signals from sub-modules
debug_tdi_i : in std_ulogic;
bs_chain_tdi_i : in std_ulogic;
mbist_tdi_i : in std_ulogic
);
end component;

begin
tap_top0 : tap_top
port map (
tms_pad_i => jtag_tms,
tck_pad_i => jtag_tck,
trst_pad_i => jtag_trst,
tdi_pad_i => jtag_tdi,
tdo_pad_o => jtag_tdo,
tdo_padoe_o => open, -- what to do with this?

shift_dr_o => shift,
pause_dr_o => open, -- what to do with this?
update_dr_o => update,
capture_dr_o => capture,

-- connect boundary scan and mbist?
extest_select_o => open,
sample_preload_select_o => open,
mbist_select_o => open,
debug_select_o => sel,

tdo_o => tdi,
debug_tdi_i => tdo,
bs_chain_tdi_i => '0',
mbist_tdi_i => '0'
);

-- dmi_req synchronization
dmi_req_sync : process(sys_clk)
begin
-- sys_reset is synchronous
if rising_edge(sys_clk) then
if (sys_reset = '1') then
jtag_req_0 <= '0';
jtag_req_1 <= '0';
else
jtag_req_0 <= jtag_req;
jtag_req_1 <= jtag_req_0;
end if;
end if;
end process;
dmi_req <= jtag_req_1;

-- dmi_ack synchronization
dmi_ack_sync: process(jtag_tck, jtag_trst)
begin
-- jtag_trst is async (see comments)
if jtag_trst = '1' then
dmi_ack_0 <= '0';
dmi_ack_1 <= '0';
elsif rising_edge(jtag_tck) then
dmi_ack_0 <= dmi_ack;
dmi_ack_1 <= dmi_ack_0;
end if;
end process;

-- jtag_bsy indicates whether we can start a new request, we can when
-- we aren't already processing one (jtag_req) and the synchronized ack
-- of the previous one is 0.
--
jtag_bsy <= jtag_req or dmi_ack_1;

-- decode request type in shift register
with shiftr(1 downto 0) select op_valid <=
'1' when DMI_REQ_RD,
'1' when DMI_REQ_WR,
'0' when others;

-- encode response op
rsp_op <= DMI_RSP_BSY when jtag_bsy = '1' else DMI_RSP_OK;

-- Some DMI out signals are directly driven from the request register
dmi_addr <= request(ABITS + DBITS + 1 downto DBITS + 2);
dmi_dout <= request(DBITS + 1 downto 2);
dmi_wr <= '1' when request(1 downto 0) = DMI_REQ_WR else '0';

-- TDO is wired to shift register bit 0
tdo <= shiftr(0);

-- Main state machine. Handles shift registers, request latch and
-- jtag_req latch. Could be split into 3 processes but it's probably
-- not worthwhile.
--
shifter: process(jtag_tck, jtag_trst, sys_reset)
begin
if jtag_trst = '1' or sys_reset = '1' then
shiftr <= (others => '0');
jtag_req <= '0';
request <= (others => '0');
elsif rising_edge(jtag_tck) then

-- Handle jtag "commands" when sel is 1
if sel = '1' then
-- Shift state, rotate the register
if shift = '1' then
shiftr <= tdi & shiftr(ABITS + DBITS + 1 downto 1);
end if;

-- Update state (trigger)
--
-- Latch the request if we aren't already processing one and
-- it has a valid command opcode.
--
if update = '1' and op_valid = '1' then
if jtag_bsy = '0' then
request <= shiftr;
jtag_req <= '1';
end if;
-- Set the shift register "op" to "busy". This will prevent
-- us from re-starting the command on the next update if
-- the command completes before that.
shiftr(1 downto 0) <= DMI_RSP_BSY;
end if;

-- Request completion.
--
-- Capture the response data for reads and clear request flag.
--
-- Note: We clear req (and thus dmi_req) here which relies on tck
-- ticking and sel set. This means we are stuck with dmi_req up if
-- the jtag interface stops. Slaves must be resilient to this.
--
if jtag_req = '1' and dmi_ack_1 = '1' then
jtag_req <= '0';
if request(1 downto 0) = DMI_REQ_RD then
request(DBITS + 1 downto 2) <= dmi_din;
end if;
end if;

-- Capture state, grab latch content with updated status
if capture = '1' then
shiftr <= request(ABITS + DBITS + 1 downto 2) & rsp_op;
end if;

end if;
end if;
end process;
end architecture behaviour;

@ -44,6 +44,7 @@ begin
DRAM_ABITS => 24,
DRAM_ALINES => 1,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => DRAM_INIT_FILE,
PAYLOAD_SIZE => DRAM_INIT_SIZE

@ -99,21 +99,20 @@ architecture behaviour of execute1 is
signal mshort_p : std_ulogic_vector(31 downto 0) := (others => '0');

signal valid_in : std_ulogic;
signal ctrl: ctrl_t := (others => (others => '0'));
signal ctrl_tmp: ctrl_t := (others => (others => '0'));
signal ctrl: ctrl_t;
signal ctrl_tmp: ctrl_t;
signal right_shift, rot_clear_left, rot_clear_right: std_ulogic;
signal rot_sign_ext: std_ulogic;
signal rotator_result: std_ulogic_vector(63 downto 0);
signal rotator_carry: std_ulogic;
signal logical_result: std_ulogic_vector(63 downto 0);
signal countzero_result: std_ulogic_vector(63 downto 0);
signal do_popcnt: std_ulogic;
signal countbits_result: std_ulogic_vector(63 downto 0);
signal alu_result: std_ulogic_vector(63 downto 0);
signal adder_result: std_ulogic_vector(63 downto 0);
signal misc_result: std_ulogic_vector(63 downto 0);
signal muldiv_result: std_ulogic_vector(63 downto 0);
signal spr_result: std_ulogic_vector(63 downto 0);
signal result_mux_sel: std_ulogic_vector(2 downto 0);
signal sub_mux_sel: std_ulogic_vector(2 downto 0);
signal next_nia : std_ulogic_vector(63 downto 0);
signal current: Decode2ToExecute1Type;

@ -284,13 +283,15 @@ begin
datalen => e_in.data_len
);

countzero_0: entity work.zero_counter
countbits_0: entity work.bit_counter
port map (
clk => clk,
rs => c_in,
count_right => e_in.insn(10),
is_32bit => e_in.is_32bit,
result => countzero_result
do_popcnt => do_popcnt,
datalen => e_in.data_len,
result => countbits_result
);

multiply_0: entity work.multiply
@ -391,7 +392,7 @@ begin
logical_result when "001",
rotator_result when "010",
muldiv_result when "011",
countzero_result when "100",
countbits_result when "100",
spr_result when "101",
next_nia when "110",
misc_result when others;
@ -403,6 +404,7 @@ begin
r <= reg_type_init;
ctrl.tb <= (others => '0');
ctrl.dec <= (others => '0');
ctrl.cfar <= (others => '0');
ctrl.msr <= (MSR_SF => '1', MSR_LE => '1', others => '0');
else
r <= rin;
@ -813,6 +815,8 @@ begin
rot_clear_right <= '1' when e_in.insn_type = OP_RLC or e_in.insn_type = OP_RLCR else '0';
rot_sign_ext <= '1' when e_in.insn_type = OP_EXTSWSLI else '0';

do_popcnt <= '1' when e_in.insn_type = OP_POPCNT else '0';

illegal := '0';
if r.intr_pending = '1' then
v.e.srr1 := r.e.srr1;
@ -963,7 +967,7 @@ begin
when OP_ADDG6S =>
when OP_CMPRB =>
when OP_CMPEQB =>
when OP_AND | OP_OR | OP_XOR | OP_POPCNT | OP_PRTY | OP_CMPB | OP_EXTS |
when OP_AND | OP_OR | OP_XOR | OP_PRTY | OP_CMPB | OP_EXTS |
OP_BPERM | OP_BCD =>

when OP_B =>
@ -1025,7 +1029,7 @@ begin
end if;
do_trace := '0';

when OP_CNTZ =>
when OP_CNTZ | OP_POPCNT =>
v.e.valid := '0';
v.cntz_in_progress := '1';
v.busy := '1';
@ -1220,7 +1224,7 @@ begin
-- valid_in = 0. Hence they don't happen in the same cycle as any of
-- the cases above which depend on valid_in = 1.
if r.cntz_in_progress = '1' then
-- cnt[lt]z always takes two cycles
-- cnt[lt]z and popcnt* always take two cycles
v.e.valid := '1';
elsif r.mul_in_progress = '1' or r.div_in_progress = '1' then
if (r.mul_in_progress = '1' and multiply_to_x.valid = '1') or

@ -89,9 +89,8 @@ begin
r_int.predicted_taken <= r_next_int.predicted_taken;
r_int.pred_not_taken <= r_next_int.pred_not_taken;
r_int.predicted_nia <= r_next_int.predicted_nia;
r_int.rd_is_niap4 <= r_next.sequential;
r_int.rd_is_niap4 <= r_next_int.rd_is_niap4;
end if;
r.sequential <= r_next.sequential and advance_nia;
-- always send the up-to-date stop mark and req
r.stop_mark <= stop_in;
r.req <= not rst;
@ -145,11 +144,11 @@ begin
begin
v := r;
v_int := r_int;
v.sequential := '0';
v.predicted := '0';
v.pred_ntaken := '0';
v_int.predicted_taken := '0';
v_int.pred_not_taken := '0';
v_int.rd_is_niap4 := '0';

if rst = '1' then
if alt_reset_in = '1' then
@ -180,7 +179,7 @@ begin
v.nia := r_int.predicted_nia;
v.predicted := '1';
else
v.sequential := '1';
v_int.rd_is_niap4 := '1';
v.pred_ntaken := r_int.pred_not_taken;
v.nia := std_ulogic_vector(unsigned(r.nia) + 4);
if r_int.mode_32bit = '1' then

@ -94,6 +94,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -252,6 +256,9 @@ begin
-- but for now, assert it's 100Mhz
assert CLK_FREQUENCY = 100000000;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

reset_controller: entity work.soc_reset
generic map(
RESET_LOW => false,
@ -272,6 +279,7 @@ begin
DRAM_ABITS => 26,
DRAM_ALINES => 16,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -304,8 +312,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -163,6 +163,10 @@ architecture behaviour of toplevel is
signal gpio_out : std_ulogic_vector(NGPIO - 1 downto 0);
signal gpio_dir : std_ulogic_vector(NGPIO - 1 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -382,11 +386,15 @@ begin
end if;
end process;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 24,
DRAM_ALINES => 14,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -419,8 +427,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -97,6 +97,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -270,11 +274,15 @@ begin
rst_out => open
);

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 25,
DRAM_ALINES => 15,
DRAM_DLINES => 32,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 256,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -307,8 +315,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -139,6 +139,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -330,11 +334,15 @@ begin
end if;
end process;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 25,
DRAM_ALINES => 15,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -367,8 +375,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -63,10 +63,10 @@ entity toplevel is
ddram_dm : out std_ulogic_vector(1 downto 0);
ddram_dq : inout std_ulogic_vector(15 downto 0);
ddram_dqs_p : inout std_ulogic_vector(1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(0 downto 0);
-- only the positive differential pin is instantiated
--ddram_dqs_n : inout std_ulogic_vector(1 downto 0);
--ddram_clk_n : out std_ulogic;
--ddram_clk_n : out std_ulogic_vector(0 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic;
@ -331,6 +331,7 @@ begin
DRAM_ABITS => 24,
DRAM_ALINES => 14,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
NUM_LINES => 8, -- reduce from default of 64 to make smaller/timing
PAYLOAD_FILE => RAM_INIT_FILE,

@ -139,6 +139,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_ulogic_vector(0 downto 0);
signal ddram_clk_n_vec : std_ulogic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -331,11 +335,15 @@ begin
end if;
end process;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 24,
DRAM_ALINES => 14,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -368,8 +376,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -16,7 +16,7 @@ entity fpu is
clk : in std_ulogic;
rst : in std_ulogic;

e_in : in Execute1toFPUType;
e_in : in Execute1ToFPUType;
e_out : out FPUToExecute1Type;

w_out : out FPUToWritebackType
@ -197,7 +197,7 @@ architecture behaviour of fpu is
-- Each output value is the inverse of the center of the input
-- range for the value, i.e. entry 0 is 1 / (1 + 1/512),
-- entry 1 is 1 / (1 + 3/512), etc.
signal inverse_table : lookup_table := (
constant inverse_table : lookup_table := (
-- 1/x lookup table
-- Unit bit is assumed to be 1, so input range is [1, 2)
18x"3fc01", 18x"3f411", 18x"3ec31", 18x"3e460", 18x"3dc9f", 18x"3d4ec", 18x"3cd49", 18x"3c5b5",
@ -549,6 +549,10 @@ begin
r.do_intr <= '0';
r.fpscr <= (others => '0');
r.writing_back <= '0';
r.dest_fpr <= (others =>'0');
r.cr_mask <= (others =>'0');
r.cr_result <= (others =>'0');
r.instr_tag.valid <= '0';
else
assert not (r.state /= IDLE and e_in.valid = '1') severity failure;
r <= rin;

@ -40,8 +40,8 @@ architecture behaviour of gpio is
constant GPIO_REG_DATA_CLR : std_ulogic_vector(GPIO_REG_BITS-1 downto 0) := "00101";

-- Current output value and direction
signal reg_data : std_ulogic_vector(NGPIO - 1 downto 0) := (others => '0');
signal reg_dirn : std_ulogic_vector(NGPIO - 1 downto 0) := (others => '0');
signal reg_data : std_ulogic_vector(NGPIO - 1 downto 0);
signal reg_dirn : std_ulogic_vector(NGPIO - 1 downto 0);
signal reg_in1 : std_ulogic_vector(NGPIO - 1 downto 0);
signal reg_in2 : std_ulogic_vector(NGPIO - 1 downto 0);


@ -60,9 +60,62 @@ _start:

.global boot_entry
boot_entry:
LOAD_IMM64(%r10,__bss_start)
LOAD_IMM64(%r11,__bss_end)
subf %r11,%r10,%r11
addi %r11,%r11,63
srdi. %r11,%r11,6
beq 2f
mtctr %r11
1: dcbz 0,%r10
addi %r10,%r10,64
bdnz 1b

/* setup stack */
LOAD_IMM64(%r1, STACK_TOP - 0x100)
2: LOAD_IMM64(%r1,__stack_top)
li %r0,0
stdu %r0,-32(%r1)
LOAD_IMM64(%r12, main)
mtctr %r12,
mtctr %r12
bctrl
attn // terminate on exit
b .

#define EXCEPTION(nr) \
.= nr ;\
b .

/* More exception stubs */
EXCEPTION(0x300)
EXCEPTION(0x380)
EXCEPTION(0x400)
EXCEPTION(0x480)
EXCEPTION(0x500)
EXCEPTION(0x600)
EXCEPTION(0x700)
EXCEPTION(0x800)
EXCEPTION(0x900)
EXCEPTION(0x980)
EXCEPTION(0xa00)
EXCEPTION(0xb00)
EXCEPTION(0xc00)
EXCEPTION(0xd00)
EXCEPTION(0xe00)
EXCEPTION(0xe20)
EXCEPTION(0xe40)
EXCEPTION(0xe60)
EXCEPTION(0xe80)
EXCEPTION(0xf00)
EXCEPTION(0xf20)
EXCEPTION(0xf40)
EXCEPTION(0xf60)
EXCEPTION(0xf80)
#if 0
EXCEPTION(0x1000)
EXCEPTION(0x1100)
EXCEPTION(0x1200)
EXCEPTION(0x1300)
EXCEPTION(0x1400)
EXCEPTION(0x1500)
EXCEPTION(0x1600)
#endif

Binary file not shown.

Binary file not shown.

@ -35,13 +35,24 @@ a64b5a7d14004a39
a602487d05009f42
a64b5a7d14004a39
2402004ca64b7b7d
3c20000048000004
3d40000048000004
794a07c6614a0000
614a1900654a0000
616b00003d600000
656b0000796b07c6
7d6a5850616b1980
796bd183396b003f
7d6903a641820014
394a00407c0057ec
3c2000004200fff8
782107c660210000
60211f0064210000
618c00003d800000
658c0000798c07c6
7d8903a6618c0414
480000004e800421
6021398064210000
f801ffe138000000
3d8000007c1243a6
798c07c6618c0000
618c1000658c0000
4e8004217d8903a6
4800000000000200
0000000000000000
0000000000000000
0000000000000000
@ -83,6 +94,7 @@ a64b5a7d14004a39
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
@ -98,6 +110,7 @@ a64b5a7d14004a39
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
@ -113,6 +126,7 @@ a64b5a7d14004a39
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
@ -126,148 +140,518 @@ a64b5a7d14004a39
0000000000000000
0000000000000000
0000000000000000
e8010010ebc1fff0
7c0803a6ebe1fff8
3c4000014e800020
7c0802a638428a00
f8010010fbe1fff8
480001edf821ffd1
6000000060000000
4800015538628000
4800004960000000
7c7f1b7860000000
57ff063e5463063e
60000000480000b9
4082ffe02c1f000d
480000a53860000a
4bffffd060000000
0100000000000000
3c40000100000180
6000000038428a00
6000000089228090
2c09000039428088
e92a000041820030
7c0004ac39290014
712900017d204eaa
e86a00004182ffec
7c601eaa7c0004ac
4e8000205463063e
39290010e92a0000
7d204eea7c0004ac
4082ffec71290001
38630008e86a0000
7c601eea7c0004ac
000000004bffffd0
0000000000000000
38428a003c400001
8922809060000000
3942808860000000
4182002c2c090000
39290014e92a0000
7d204eaa7c0004ac
4182ffec71290020
7c0004ace92a0000
4e8000207c604faa
39290010e92a0000
7d204eea7c0004ac
4082ffec71290008
e94a00005469063e
7d2057ea7c0004ac
000000004e800020
0000000000000000
38428a003c400001
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000048000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
384298003c400001
fbe1fff87c0802a6
3be3fffffbc1fff0
f821ffd1f8010010
2c3e00008fdf0001
3821003040820010
4bfffe4438600000
4082000c281e000a
4bffff453860000d
4bffff3d7fc3f378
60000000480001ed
3862800060000000
6000000048000155
6000000048000049
5463063e7c7f1b78
480000b957ff063e
2c1f000d60000000
3860000a4082ffe0
60000000480000a5
000000004bffffd0
0000028001000000
386000007c691b78
2c0a00007d4918ae
386300014d820020
000000004bfffff0
0000000000000000
38428a003c400001
614a00203d40c000
7c0004ac794a0020
3d20c0007d4056ea
61290008794a0600
7c0004ac79290020
712900207d204eea
3d20c00041820018
7929002061290040
0000018001000000
384298003c400001
8922810860000000
3942810060000000
418200302c090000
39290014e92a0000
7d204eaa7c0004ac
4182ffec71290001
7c0004ace86a0000
5463063e7c601eaa
e92a00004e800020
7c0004ac39290010
712900017d204eea
e86a00004082ffec
7c0004ac38630008
4bffffd07c601eea
0000000000000000
3c40000100000000
6000000038429800
6000000089228108
2c09000039428100
e92a00004182002c
7c0004ac39290014
712900207d204eaa
e92a00004182ffec
7c604faa7c0004ac
e92a00004e800020
7c0004ac39290010
712900087d204eea
5469063e4082ffec
7c0004ace94a0000
4e8000207d2057ea
0000000000000000
3c40000100000000
7c0802a638429800
fbc1fff0fbe1fff8
f80100103be3ffff
8fdf0001f821ffd1
408200102c3e0000
3860000038210030
281e000a480001e8
3860000d4082000c
7fc3f3784bffff45
4bffffd04bffff3d
0100000000000000
7c691b7800000280
7d4918ae38600000
4d8200202c0a0000
4bfffff038630001
0000000000000000
3c40000100000000
3d40c00038429800
794a0020614a0020
7d4056ea7c0004ac
794a06003d20c000
7929002061290008
7d204eea7c0004ac
3d00c0007929f804
6108200079290fc3
6000000079080020
3d00001cf9028088
7d4a439261082000
6000000041820084
9922809039200001
6108200c3d00c000
790800203920ff80
7d2047aa7c0004ac
7c0004ace9228088
e92280887d404faa
39290004794ac202
4182001871290020
612900403d20c000
7c0004ac79290020
7929f8047d204eea
79290fc33d00c000
7908002061082000
f902810060000000
610820003d00001c
418200847d4a4392
3920000160000000
3d00c00099228108
3920ff806108200c
7c0004ac79080020
e92281007d2047aa
7d404faa7c0004ac
39400003e9228088
7c0004ac3929000c
e92280887d404faa
7c0004ac39290010
e92280887d404faa
3929000839400007
794ac202e9228100
7c0004ac39290004
e92281007d404faa
3929000c39400003
7d404faa7c0004ac
600000004e800020
99228090394affff
612920183d20c000
7c0004ac79290020
4e8000207d404fea
39290010e9228100
7d404faa7c0004ac
39400007e9228100
7c0004ac39290008
4e8000207d404faa
394affff60000000
3d20c00099228108
7929002061292018
7d404fea7c0004ac
000000004e800020
0000000000000000
3c40000100000000
6000000038428a00
2c09000089228090
e922808860000000
2c23000041820024
4182000878840e28
3929000460840001
7c804faa7c0004ac
2c2400004e800020
6063000241820008
7c0004ac39290020
4e8000207c604fea
384298003c400001
8922810860000000
600000002c090000
41820024e9228100
78840e282c230000
6084000141820008
7c0004ac39290004
4e8000207c804faa
418200082c240000
3929002060630002
7c604fea7c0004ac
000000004e800020
0000000000000000
0000001000000000
e8010010ebc1fff0
7c0803a6ebe1fff8
000000104e800020
00527a0100000000
00010c1b01417804
0000001800000018
00000070fffffc54
00000070fffffc40
9f7e4111300e4600
0000001000000001
00527a0100000000
00010c1b01417804
0000001800000010
00000084fffffc94
00000084fffffc80
0000001000000000
fffffd040000002c
fffffcf00000002c
0000000000000080
0000004000000028
00000060fffffd70
00000060fffffd5c
9e019f0041094500
447e4111300e4302
4106dedf42000e0a
000000100000000b
fffffda40000006c
fffffd900000006c
0000000000000028
0000008000000010
0000012cfffffdb8
0000012cfffffda4
0000001000000000
fffffed000000094
fffffebc00000094
0000000000000068
0000000000000000
0000000000000000
@ -318,6 +702,70 @@ fffffed000000094
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
4f4f6f2e2020200a
0a20202020202e6f
2020202020222e20

@ -1,13 +1,27 @@
SECTIONS
{
_start = .;
. = 0;
_start = .;
.head : {
KEEP(*(.head))
}
. = 0x400;
.text : { *(.text) }
. = 0xA00;
.data : { *(.data) }
.bss : { *(.bss) }
}
. = 0x1000;
.text : { *(.text) *(.text.*) *(.rodata) *(.rodata.*) }
. = 0x1800;
.data : { *(.data) *(.data.*) *(.got) *(.toc) }
. = ALIGN(0x80);
__bss_start = .;
.bss : {
*(.dynsbss)
*(.sbss)
*(.scommon)
*(.dynbss)
*(.bss)
*(.common)
*(.bss.*)
}
. = ALIGN(0x80);
__bss_end = .;
. = . + 0x2000;
__stack_top = .;
}

@ -28,7 +28,9 @@ package helpers is

function bit_reverse(a: std_ulogic_vector) return std_ulogic_vector;
function bit_number(a: std_ulogic_vector(63 downto 0)) return std_ulogic_vector;
function edgelocation(v: std_ulogic_vector; nbits: natural) return std_ulogic_vector;
function count_left_zeroes(val: std_ulogic_vector) return std_ulogic_vector;
function count_right_zeroes(val: std_ulogic_vector) return std_ulogic_vector;
end package helpers;

package body helpers is
@ -247,16 +249,50 @@ package body helpers is
return ret;
end;

-- Count leading zeroes operation
-- Assuming the input 'v' is a value of the form 1...10...0,
-- the output is the bit number of the rightmost 1 bit in v.
-- If v is zero, the result is zero.
function edgelocation(v: std_ulogic_vector; nbits: natural) return std_ulogic_vector is
variable p: std_ulogic_vector(nbits - 1 downto 0);
variable stride: natural;
variable b: std_ulogic;
variable k: natural;
begin
stride := 2;
for i in 0 to nbits - 1 loop
b := '0';
for j in 0 to (2**nbits / stride) - 1 loop
k := j * stride;
b := b or (v(k + stride - 1) and not v(k + (stride/2) - 1));
end loop;
p(i) := b;
stride := stride * 2;
end loop;
return p;
end function;

-- Count leading zeroes operations
-- Assumes the value passed in is not zero (if it is, zero is returned)
function count_left_zeroes(val: std_ulogic_vector) return std_ulogic_vector is
variable rev: std_ulogic_vector(val'left downto val'right);
function count_right_zeroes(val: std_ulogic_vector) return std_ulogic_vector is
variable sum: std_ulogic_vector(val'left downto val'right);
variable onehot: std_ulogic_vector(val'left downto val'right);
variable edge: std_ulogic_vector(val'left downto val'right);
variable bn, bn_e, bn_o: std_ulogic_vector(5 downto 0);
begin
sum := std_ulogic_vector(- signed(val));
onehot := sum and val;
edge := sum or val;
bn_e := edgelocation(std_ulogic_vector(resize(signed(edge), 64)), 6);
bn_o := bit_number(std_ulogic_vector(resize(unsigned(onehot), 64)));
bn := bn_e(5 downto 2) & bn_o(1 downto 0);
return bn;
end;

function count_left_zeroes(val: std_ulogic_vector) return std_ulogic_vector is
variable rev: std_ulogic_vector(val'left downto val'right);
begin
rev := bit_reverse(val);
sum := std_ulogic_vector(- signed(rev));
onehot := sum and rev;
return bit_number(std_ulogic_vector(resize(unsigned(onehot), 64)));
return count_right_zeroes(rev);
end;

end package body helpers;

@ -212,7 +212,6 @@ architecture rtl of icache is
signal ra_valid : std_ulogic;
signal priv_fault : std_ulogic;
signal access_ok : std_ulogic;
signal use_previous : std_ulogic;

-- Cache RAM interface
type cache_ram_out_t is array(way_t) of cache_row_t;
@ -397,7 +396,7 @@ begin
wr_dat(ii * 8 + 7 downto ii * 8) <= wishbone_in.dat(j * 8 + 7 downto j * 8);
end loop;
end if;
do_read <= not (stall_in or use_previous);
do_read <= not stall_in;
do_write <= '0';
if wishbone_in.ack = '1' and replace_way = i then
do_write <= '1';
@ -503,16 +502,6 @@ begin
variable is_hit : std_ulogic;
variable hit_way : way_t;
begin
-- i_in.sequential means that i_in.nia this cycle is 4 more than
-- last cycle. If we read more than 32 bits at a time, had a cache hit
-- last cycle, and we don't want the first 32-bit chunk, then we can
-- keep the data we read last cycle and just use that.
if unsigned(i_in.nia(INSN_BITS+2-1 downto 2)) /= 0 then
use_previous <= i_in.req and i_in.sequential and r.hit_valid;
else
use_previous <= '0';
end if;

-- Extract line, row and tag from request
req_index <= get_index(i_in.nia);
req_row <= get_row(i_in.nia);
@ -542,7 +531,7 @@ begin
end loop;

-- Generate the "hit" and "miss" signals for the synchronous blocks
if i_in.req = '1' and access_ok = '1' and flush_in = '0' and rst = '0' and use_previous = '0' then
if i_in.req = '1' and access_ok = '1' and flush_in = '0' and rst = '0' then
req_is_hit <= is_hit;
req_is_miss <= not is_hit;
else
@ -566,7 +555,11 @@ begin
-- I prefer not to do just yet as it would force fetch2 to know about
-- some of the cache geometry information.
--
i_out.insn <= read_insn_word(r.hit_nia, cache_out(r.hit_way));
if r.hit_valid = '1' then
i_out.insn <= read_insn_word(r.hit_nia, cache_out(r.hit_way));
else
i_out.insn <= (others => '0');
end if;
i_out.valid <= r.hit_valid;
i_out.nia <= r.hit_nia;
i_out.stop_mark <= r.hit_smark;
@ -576,7 +569,7 @@ begin
i_out.next_pred_ntaken <= i_in.pred_ntaken;

-- Stall fetch1 if we have a miss on cache or TLB or a protection fault
stall_out <= not (is_hit and access_ok) and not use_previous;
stall_out <= not (is_hit and access_ok);

-- Wishbone requests output (from the cache miss reload machine)
wishbone_out <= r.wb;
@ -588,8 +581,7 @@ begin
if rising_edge(clk) then
-- keep outputs to fetch2 unchanged on a stall
-- except that flush or reset sets valid to 0
-- If use_previous, keep the same data as last cycle and use the second half
if stall_in = '1' or use_previous = '1' then
if stall_in = '1' then
if rst = '1' or flush_in = '1' then
r.hit_valid <= '0';
end if;
@ -832,4 +824,7 @@ begin
end process;
log_out <= log_data;
end generate;

events <= ev;

end;

@ -74,6 +74,9 @@ begin
i_out.req <= '0';
i_out.nia <= (others => '0');
i_out.stop_mark <= '0';
i_out.priv_mode <= '1';
i_out.virt_mode <= '0';
i_out.big_endian <= '0';

m_out.tlbld <= '0';
m_out.tlbie <= '0';

@ -1,636 +0,0 @@
//////////////////////////////////////////////////////////////////////
//// ////
//// tap_top.v ////
//// ////
//// ////
//// This file is part of the JTAG Test Access Port (TAP) ////
//// http://www.opencores.org/projects/jtag/ ////
//// ////
//// Author(s): ////
//// Igor Mohor (igorm@opencores.org) ////
//// ////
//// ////
//// All additional information is avaliable in the README.txt ////
//// file. ////
//// ////
//////////////////////////////////////////////////////////////////////
//// ////
//// Copyright (C) 2000 - 2003 Authors ////
//// ////
//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer. ////
//// ////
//// This source file is free software; you can redistribute it ////
//// and/or modify it under the terms of the GNU Lesser General ////
//// Public License as published by the Free Software Foundation; ////
//// either version 2.1 of the License, or (at your option) any ////
//// later version. ////
//// ////
//// This source is distributed in the hope that it will be ////
//// useful, but WITHOUT ANY WARRANTY; without even the implied ////
//// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR ////
//// PURPOSE. See the GNU Lesser General Public License for more ////
//// details. ////
//// ////
//// You should have received a copy of the GNU Lesser General ////
//// Public License along with this source; if not, download it ////
//// from http://www.opencores.org/lgpl.shtml ////
//// ////
//////////////////////////////////////////////////////////////////////
//
// CVS Revision History
//
// $Log: not supported by cvs2svn $
// Revision 1.5 2004/01/18 09:27:39 simons
// Blocking non blocking assignmenst fixed.
//
// Revision 1.4 2004/01/17 17:37:44 mohor
// capture_dr_o added to ports.
//
// Revision 1.3 2004/01/14 13:50:56 mohor
// 5 consecutive TMS=1 causes reset of TAP.
//
// Revision 1.2 2004/01/08 10:29:44 mohor
// Control signals for tdo_pad_o mux are changed to negedge.
//
// Revision 1.1 2003/12/23 14:52:14 mohor
// Directory structure changed. New version of TAP.
//
// Revision 1.10 2003/10/23 18:08:01 mohor
// MBIST chain connection fixed.
//
// Revision 1.9 2003/10/23 16:17:02 mohor
// CRC logic changed.
//
// Revision 1.8 2003/10/21 09:48:31 simons
// Mbist support added.
//
// Revision 1.7 2002/11/06 14:30:10 mohor
// Trst active high. Inverted on higher layer.
//
// Revision 1.6 2002/04/22 12:55:56 mohor
// tdo_padoen_o changed to tdo_padoe_o. Signal is active high.
//
// Revision 1.5 2002/03/26 14:23:38 mohor
// Signal tdo_padoe_o changed back to tdo_padoen_o.
//
// Revision 1.4 2002/03/25 13:16:15 mohor
// tdo_padoen_o changed to tdo_padoe_o. Signal was always active high, just
// not named correctly.
//
// Revision 1.3 2002/03/12 14:30:05 mohor
// Few outputs for boundary scan chain added.
//
// Revision 1.2 2002/03/12 10:31:53 mohor
// tap_top and dbg_top modules are put into two separate modules. tap_top
// contains only tap state machine and related logic. dbg_top contains all
// logic necessery for debugging.
//
// Revision 1.1 2002/03/08 15:28:16 mohor
// Structure changed. Hooks for jtag chain added.
//
//
//
//

// Top module
module tap_top #(parameter
IDCODE_VALUE = 32'h14d57049,
IR_LENGTH = 6)
(
// JTAG pads
tms_pad_i,
tck_pad_i,
trst_pad_i,
tdi_pad_i,
tdo_pad_o,
tdo_padoe_o,

// TAP states
shift_dr_o,
pause_dr_o,
update_dr_o,
capture_dr_o,
// Select signals for boundary scan or mbist
extest_select_o,
sample_preload_select_o,
mbist_select_o,
debug_select_o,
// TDO signal that is connected to TDI of sub-modules.
tdo_o,
// TDI signals from sub-modules
debug_tdi_i, // from debug module
bs_chain_tdi_i, // from Boundary Scan Chain
mbist_tdi_i // from Mbist Chain
);


// JTAG pins
input tms_pad_i; // JTAG test mode select pad
input tck_pad_i; // JTAG test clock pad
input trst_pad_i; // JTAG test reset pad
input tdi_pad_i; // JTAG test data input pad
output tdo_pad_o; // JTAG test data output pad
output tdo_padoe_o; // Output enable for JTAG test data output pad

// TAP states
output shift_dr_o;
output pause_dr_o;
output update_dr_o;
output capture_dr_o;

// Select signals for boundary scan or mbist
output extest_select_o;
output sample_preload_select_o;
output mbist_select_o;
output debug_select_o;

// TDO signal that is connected to TDI of sub-modules.
output tdo_o;

// TDI signals from sub-modules
input debug_tdi_i; // from debug module
input bs_chain_tdi_i; // from Boundary Scan Chain
input mbist_tdi_i; // from Mbist Chain

//Internal constants
localparam EXTEST = 6'b000000;
localparam SAMPLE_PRELOAD = 6'b000001;
localparam IDCODE = 6'b001001;
localparam DEBUG = 6'b000011;
localparam MBIST = 6'b001010;
localparam BYPASS = 6'b111111;

// Registers
reg test_logic_reset;
reg run_test_idle;
reg select_dr_scan;
reg capture_dr;
reg shift_dr;
reg exit1_dr;
reg pause_dr;
reg exit2_dr;
reg update_dr;
reg select_ir_scan;
reg capture_ir;
reg shift_ir, shift_ir_neg;
reg exit1_ir;
reg pause_ir;
reg exit2_ir;
reg update_ir;
reg extest_select;
reg sample_preload_select;
reg idcode_select;
reg mbist_select;
reg debug_select;
reg bypass_select;
reg tdo_pad_o;
reg tdo_padoe_o;
reg tms_q1, tms_q2, tms_q3, tms_q4;
wire tms_reset;

assign tdo_o = tdi_pad_i;
assign shift_dr_o = shift_dr;
assign pause_dr_o = pause_dr;
assign update_dr_o = update_dr;
assign capture_dr_o = capture_dr;

assign extest_select_o = extest_select;
assign sample_preload_select_o = sample_preload_select;
assign mbist_select_o = mbist_select;
assign debug_select_o = debug_select;


always @ (posedge tck_pad_i)
begin
tms_q1 <= tms_pad_i;
tms_q2 <= tms_q1;
tms_q3 <= tms_q2;
tms_q4 <= tms_q3;
end


assign tms_reset = tms_q1 & tms_q2 & tms_q3 & tms_q4 & tms_pad_i; // 5 consecutive TMS=1 causes reset


/**********************************************************************************
* *
* TAP State Machine: Fully JTAG compliant *
* *
**********************************************************************************/

// test_logic_reset state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
test_logic_reset<= 1'b1;
else if (tms_reset)
test_logic_reset<= 1'b1;
else
begin
if(tms_pad_i & (test_logic_reset | select_ir_scan))
test_logic_reset<= 1'b1;
else
test_logic_reset<= 1'b0;
end
end

// run_test_idle state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
run_test_idle<= 1'b0;
else if (tms_reset)
run_test_idle<= 1'b0;
else
if(~tms_pad_i & (test_logic_reset | run_test_idle | update_dr | update_ir))
run_test_idle<= 1'b1;
else
run_test_idle<= 1'b0;
end

// select_dr_scan state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
select_dr_scan<= 1'b0;
else if (tms_reset)
select_dr_scan<= 1'b0;
else
if(tms_pad_i & (run_test_idle | update_dr | update_ir))
select_dr_scan<= 1'b1;
else
select_dr_scan<= 1'b0;
end

// capture_dr state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
capture_dr<= 1'b0;
else if (tms_reset)
capture_dr<= 1'b0;
else
if(~tms_pad_i & select_dr_scan)
capture_dr<= 1'b1;
else
capture_dr<= 1'b0;
end

// shift_dr state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
shift_dr<= 1'b0;
else if (tms_reset)
shift_dr<= 1'b0;
else
if(~tms_pad_i & (capture_dr | shift_dr | exit2_dr))
shift_dr<= 1'b1;
else
shift_dr<= 1'b0;
end

// exit1_dr state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
exit1_dr<= 1'b0;
else if (tms_reset)
exit1_dr<= 1'b0;
else
if(tms_pad_i & (capture_dr | shift_dr))
exit1_dr<= 1'b1;
else
exit1_dr<= 1'b0;
end

// pause_dr state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
pause_dr<= 1'b0;
else if (tms_reset)
pause_dr<= 1'b0;
else
if(~tms_pad_i & (exit1_dr | pause_dr))
pause_dr<= 1'b1;
else
pause_dr<= 1'b0;
end

// exit2_dr state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
exit2_dr<= 1'b0;
else if (tms_reset)
exit2_dr<= 1'b0;
else
if(tms_pad_i & pause_dr)
exit2_dr<= 1'b1;
else
exit2_dr<= 1'b0;
end

// update_dr state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
update_dr<= 1'b0;
else if (tms_reset)
update_dr<= 1'b0;
else
if(tms_pad_i & (exit1_dr | exit2_dr))
update_dr<= 1'b1;
else
update_dr<= 1'b0;
end

// select_ir_scan state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
select_ir_scan<= 1'b0;
else if (tms_reset)
select_ir_scan<= 1'b0;
else
if(tms_pad_i & select_dr_scan)
select_ir_scan<= 1'b1;
else
select_ir_scan<= 1'b0;
end

// capture_ir state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
capture_ir<= 1'b0;
else if (tms_reset)
capture_ir<= 1'b0;
else
if(~tms_pad_i & select_ir_scan)
capture_ir<= 1'b1;
else
capture_ir<= 1'b0;
end

// shift_ir state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
shift_ir<= 1'b0;
else if (tms_reset)
shift_ir<= 1'b0;
else
if(~tms_pad_i & (capture_ir | shift_ir | exit2_ir))
shift_ir<= 1'b1;
else
shift_ir<= 1'b0;
end

// exit1_ir state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
exit1_ir<= 1'b0;
else if (tms_reset)
exit1_ir<= 1'b0;
else
if(tms_pad_i & (capture_ir | shift_ir))
exit1_ir<= 1'b1;
else
exit1_ir<= 1'b0;
end

// pause_ir state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
pause_ir<= 1'b0;
else if (tms_reset)
pause_ir<= 1'b0;
else
if(~tms_pad_i & (exit1_ir | pause_ir))
pause_ir<= 1'b1;
else
pause_ir<= 1'b0;
end

// exit2_ir state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
exit2_ir<= 1'b0;
else if (tms_reset)
exit2_ir<= 1'b0;
else
if(tms_pad_i & pause_ir)
exit2_ir<= 1'b1;
else
exit2_ir<= 1'b0;
end

// update_ir state
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
update_ir<= 1'b0;
else if (tms_reset)
update_ir<= 1'b0;
else
if(tms_pad_i & (exit1_ir | exit2_ir))
update_ir<= 1'b1;
else
update_ir<= 1'b0;
end

/**********************************************************************************
* *
* End: TAP State Machine *
* *
**********************************************************************************/



/**********************************************************************************
* *
* jtag_ir: JTAG Instruction Register *
* *
**********************************************************************************/
reg [IR_LENGTH-1:0] jtag_ir; // Instruction register
reg [IR_LENGTH-1:0] latched_jtag_ir, latched_jtag_ir_neg;
reg instruction_tdo;

always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
jtag_ir[IR_LENGTH-1:0] <= {IR_LENGTH{1'b0}};
else if(capture_ir)
jtag_ir <= 6'b000101; // This value is fixed for easier fault detection
else if(shift_ir)
jtag_ir[IR_LENGTH-1:0] <= {tdi_pad_i, jtag_ir[IR_LENGTH-1:1]};
end

always @ (negedge tck_pad_i)
begin
instruction_tdo <= jtag_ir[0];
end
/**********************************************************************************
* *
* End: jtag_ir *
* *
**********************************************************************************/



/**********************************************************************************
* *
* idcode logic *
* *
**********************************************************************************/
reg [31:0] idcode_reg;
reg idcode_tdo;

always @ (posedge tck_pad_i)
begin
if(idcode_select & shift_dr)
idcode_reg <= {tdi_pad_i, idcode_reg[31:1]};
else
idcode_reg <= IDCODE_VALUE;
end

always @ (negedge tck_pad_i)
begin
idcode_tdo <= idcode_reg[0];
end
/**********************************************************************************
* *
* End: idcode logic *
* *
**********************************************************************************/


/**********************************************************************************
* *
* Bypass logic *
* *
**********************************************************************************/
reg bypassed_tdo;
reg bypass_reg;

always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if (trst_pad_i)
bypass_reg<= 1'b0;
else if(shift_dr)
bypass_reg<= tdi_pad_i;
end

always @ (negedge tck_pad_i)
begin
bypassed_tdo <= bypass_reg;
end
/**********************************************************************************
* *
* End: Bypass logic *
* *
**********************************************************************************/


/**********************************************************************************
* *
* Activating Instructions *
* *
**********************************************************************************/
// Updating jtag_ir (Instruction Register)
always @ (posedge tck_pad_i or posedge trst_pad_i)
begin
if(trst_pad_i)
latched_jtag_ir <= IDCODE; // IDCODE selected after reset
else if (tms_reset)
latched_jtag_ir <= IDCODE; // IDCODE selected after reset
else if(update_ir)
latched_jtag_ir <= jtag_ir;
end

/**********************************************************************************
* *
* End: Activating Instructions *
* *
**********************************************************************************/


// Updating jtag_ir (Instruction Register)
always @ (latched_jtag_ir)
begin
extest_select = 1'b0;
sample_preload_select = 1'b0;
idcode_select = 1'b0;
mbist_select = 1'b0;
debug_select = 1'b0;
bypass_select = 1'b0;

case(latched_jtag_ir) /* synthesis parallel_case */
EXTEST: extest_select = 1'b1; // External test
SAMPLE_PRELOAD: sample_preload_select = 1'b1; // Sample preload
IDCODE: idcode_select = 1'b1; // ID Code
MBIST: mbist_select = 1'b1; // Mbist test
DEBUG: debug_select = 1'b1; // Debug
BYPASS: bypass_select = 1'b1; // BYPASS
default: bypass_select = 1'b1; // BYPASS
endcase
end



/**********************************************************************************
* *
* Multiplexing TDO data *
* *
**********************************************************************************/
always @ (shift_ir_neg or exit1_ir or instruction_tdo or latched_jtag_ir_neg or idcode_tdo or
debug_tdi_i or bs_chain_tdi_i or mbist_tdi_i or
bypassed_tdo)
begin
if(shift_ir_neg)
tdo_pad_o = instruction_tdo;
else
begin
case(latched_jtag_ir_neg) // synthesis parallel_case
IDCODE: tdo_pad_o = idcode_tdo; // Reading ID code
DEBUG: tdo_pad_o = debug_tdi_i; // Debug
SAMPLE_PRELOAD: tdo_pad_o = bs_chain_tdi_i; // Sampling/Preloading
EXTEST: tdo_pad_o = bs_chain_tdi_i; // External test
MBIST: tdo_pad_o = mbist_tdi_i; // Mbist test
default: tdo_pad_o = bypassed_tdo; // BYPASS instruction
endcase
end
end


// Tristate control for tdo_pad_o pin
always @ (negedge tck_pad_i)
begin
tdo_padoe_o <= shift_ir | shift_dr | (pause_dr & debug_select);
end
/**********************************************************************************
* *
* End: Multiplexing TDO data *
* *
**********************************************************************************/


always @ (negedge tck_pad_i)
begin
shift_ir_neg <= shift_ir;
latched_jtag_ir_neg <= latched_jtag_ir;
end


endmodule

@ -13,6 +13,7 @@ entity litedram_wrapper is
DRAM_ABITS : positive;
DRAM_ALINES : natural;
DRAM_DLINES : natural;
DRAM_CKLINES : natural;
DRAM_PORT_WIDTH : positive;

-- Pseudo-ROM payload
@ -69,8 +70,8 @@ entity litedram_wrapper is
ddram_dq : inout std_ulogic_vector(DRAM_DLINES-1 downto 0);
ddram_dqs_p : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_clk_n : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic
@ -93,8 +94,8 @@ architecture behaviour of litedram_wrapper is
ddram_dq : inout std_ulogic_vector(DRAM_DLINES-1 downto 0);
ddram_dqs_p : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_clk_n : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic;

@ -102,8 +102,8 @@ entity litedram_core is
ddram_dq : inout std_ulogic_vector(15 downto 0);
ddram_dqs_p : inout std_ulogic_vector(1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(0 downto 0);
ddram_clk_n : out std_ulogic_vector(0 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic;

@ -275,10 +275,27 @@ begin
if rising_edge(clk) then
if rst = '1' then
r1.req.valid <= '0';
r1.req.tlbie <= '0';
r1.req.is_slbia <= '0';
r1.req.instr_fault <= '0';
r1.req.load <= '0';
r1.req.priv_mode <= '0';
r1.req.sprn <= (others => '0');
r1.req.xerc <= xerc_init;

r2.req.valid <= '0';
r2.req.tlbie <= '0';
r2.req.is_slbia <= '0';
r2.req.instr_fault <= '0';
r2.req.load <= '0';
r2.req.priv_mode <= '0';
r2.req.sprn <= (others => '0');
r2.req.xerc <= xerc_init;

r2.wait_dc <= '0';
r2.wait_mmu <= '0';
r2.one_cycle <= '0';

r3.dar <= (others => '0');
r3.dsisr <= (others => '0');
r3.state <= IDLE;
@ -286,6 +303,8 @@ begin
r3.interrupt <= '0';
r3.stage1_en <= '1';
r3.convert_lfs <= '0';
r3.events.load_complete <= '0';
r3.events.store_complete <= '0';
flushing <= '0';
else
r1 <= r1in;

@ -20,20 +20,7 @@ end entity logical;

architecture behaviour of logical is

subtype twobit is unsigned(1 downto 0);
type twobit32 is array(0 to 31) of twobit;
signal pc2 : twobit32;
subtype threebit is unsigned(2 downto 0);
type threebit16 is array(0 to 15) of threebit;
signal pc4 : threebit16;
subtype fourbit is unsigned(3 downto 0);
type fourbit8 is array(0 to 7) of fourbit;
signal pc8 : fourbit8;
subtype sixbit is unsigned(5 downto 0);
type sixbit2 is array(0 to 1) of sixbit;
signal pc32 : sixbit2;
signal par0, par1 : std_ulogic;
signal popcnt : std_ulogic_vector(63 downto 0);
signal parity : std_ulogic_vector(63 downto 0);
signal permute : std_ulogic_vector(7 downto 0);

@ -109,35 +96,6 @@ begin
variable negative : std_ulogic;
variable j : integer;
begin
-- population counts
for i in 0 to 31 loop
pc2(i) <= unsigned("0" & rs(i * 2 downto i * 2)) + unsigned("0" & rs(i * 2 + 1 downto i * 2 + 1));
end loop;
for i in 0 to 15 loop
pc4(i) <= ('0' & pc2(i * 2)) + ('0' & pc2(i * 2 + 1));
end loop;
for i in 0 to 7 loop
pc8(i) <= ('0' & pc4(i * 2)) + ('0' & pc4(i * 2 + 1));
end loop;
for i in 0 to 1 loop
pc32(i) <= ("00" & pc8(i * 4)) + ("00" & pc8(i * 4 + 1)) +
("00" & pc8(i * 4 + 2)) + ("00" & pc8(i * 4 + 3));
end loop;
popcnt <= (others => '0');
if datalen(3 downto 2) = "00" then
-- popcntb
for i in 0 to 7 loop
popcnt(i * 8 + 3 downto i * 8) <= std_ulogic_vector(pc8(i));
end loop;
elsif datalen(3) = '0' then
-- popcntw
for i in 0 to 1 loop
popcnt(i * 32 + 5 downto i * 32) <= std_ulogic_vector(pc32(i));
end loop;
else
popcnt(6 downto 0) <= std_ulogic_vector(('0' & pc32(0)) + ('0' & pc32(1)));
end if;

-- parity calculations
par0 <= rs(0) xor rs(8) xor rs(16) xor rs(24);
par1 <= rs(32) xor rs(40) xor rs(48) xor rs(56);
@ -178,8 +136,6 @@ begin
tmp := not tmp;
end if;

when OP_POPCNT =>
tmp := popcnt;
when OP_PRTY =>
tmp := parity;
when OP_CMPB =>

@ -18,7 +18,7 @@ filesets:
- ppc_fx_insns.vhdl
- sim_console.vhdl
- logical.vhdl
- countzero.vhdl
- countbits.vhdl
- control.vhdl
- execute1.vhdl
- fpu.vhdl

@ -1,10 +1,16 @@
CFLAGS = -O2 -g -Wall -std=c99
# CFLAGS += -I urjtag/urjtag/include/ -L urjtag/urjtag/src/.libs/
#
ifeq ($(STATIC_URJTAG), 1)
LIBURJTAG=-Wl,-Bstatic -lurjtag -Wl,-Bdynamic -lftdi1 -lusb-1.0 -lreadline
else
LIBURJTAG=-lurjtag
endif

all: mw_debug

mw_debug: mw_debug.c
$(CC) -o $@ $^ $(CFLAGS) -lurjtag
$(CC) -o $@ $^ $(CFLAGS) $(LIBURJTAG)

clean:
rm -f mw_debug

@ -49,7 +49,7 @@
static bool debug;

struct backend {
int (*init)(const char *target);
int (*init)(const char *target, int freq);
int (*reset)(void);
int (*command)(uint8_t op, uint8_t addr, uint64_t *data);
};
@ -67,13 +67,15 @@ static void check(int r, const char *failstr)

static int sim_fd = -1;

static int sim_init(const char *target)
static int sim_init(const char *target, int freq)
{
struct sockaddr_in saddr;
struct hostent *hp;
const char *p, *host;
int port, rc;

(void)freq;

if (!target)
target = "localhost:13245";
p = strchr(target, ':');
@ -210,22 +212,33 @@ static struct backend sim_backend = {

static urj_chain_t *jc;

static int jtag_init(const char *target)
static int common_jtag_init(const char *target, int freq)
{
const char *sep;
const char *cable;
char *params[] = { NULL, };
urj_part_t *p;
uint32_t id;
int rc, part;
const int max_params = 20;
char *params[max_params+1];
int rc;

if (!target)
target = "probe";
sep = strchr(target, ':');
memset(params, 0x0, sizeof(params));
sep = strchr(target, ' ');
cable = strndup(target, sep - target);
if (sep && *sep) {
fprintf(stderr, "jtag cable params not supported yet\n");
return -1;
char *param_str = strdup(sep);
char *s = param_str;
for (int i = 0; *s; s++) {
if (*s == ' ') {
if (i >= max_params) {
fprintf(stderr, "Too many jtag cable params\n");
return -1;
}
*s = '\0';
params[i] = s+1;
i++;
}
}
}
if (debug)
printf("Opening jtag backend cable '%s'\n", cable);
@ -241,17 +254,35 @@ static int jtag_init(const char *target)
char *cparams[] = { NULL, NULL,};
rc = urj_tap_cable_usb_probe(cparams);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG cable probe failed\n");
fprintf(stderr, "JTAG cable probe failed: %s\n", urj_error_describe());
return -1;
}
cable = strdup(cparams[1]);
}
rc = urj_tap_chain_connect(jc, cable, params);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG cable detect failed\n");
fprintf(stderr, "JTAG cable detect failed: %s\n", urj_error_describe());
return -1;
}

if (freq) {
urj_tap_cable_set_frequency(jc->cable, freq);
}

return 0;
}

static int bscane2_init(const char *target, int freq)
{
urj_part_t *p;
uint32_t id;
int rc;

rc = common_jtag_init(target, freq);
if (rc < 0) {
return rc;
}

/* XXX Hard wire part 0, that might need to change (use params and detect !) */
rc = urj_tap_manual_add(jc, 6);
if (rc < 0) {
@ -264,7 +295,7 @@ static int jtag_init(const char *target)
}
urj_part_parts_set_instruction(jc->parts, "BYPASS");

jc->active_part = part = 0;
jc->active_part = 0;

p = urj_tap_chain_active_part(jc);
if (!p) {
@ -300,6 +331,69 @@ static int jtag_init(const char *target)
return 0;
}

static int ecp5_init(const char *target, int freq)
{
urj_part_t *p;
uint32_t id;
int rc;

rc = common_jtag_init(target, freq);
if (rc < 0) {
return rc;
}

/* XXX Hard wire part 0, that might need to change (use params and detect !) */
rc = urj_tap_manual_add(jc, 8);
if (rc < 0) {
fprintf(stderr, "JTAG failed to add part! : %s\n", urj_error_describe());
return -1;
}
if (jc->parts == NULL || jc->parts->len == 0) {
fprintf(stderr, "JTAG Something's wrong after adding part! : %s\n", urj_error_describe());
return -1;
}
urj_part_parts_set_instruction(jc->parts, "BYPASS");

jc->active_part = 0;

p = urj_tap_chain_active_part(jc);
if (!p) {
fprintf(stderr, "Failed to get active JTAG part\n");
return -1;
}
rc = urj_part_data_register_define(p, "IDCODE_REG", 32);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG failed to add IDCODE_REG register! : %s\n",
urj_error_describe());
return -1;
}
// READ_ID = 0xE0 = 11100000, from Lattice TN1260 sysconfig guide
if (urj_part_instruction_define(p, "IDCODE", "11100000", "IDCODE_REG") == NULL) {
fprintf(stderr, "JTAG failed to add IDCODE instruction! : %s\n",
urj_error_describe());
return -1;
}
rc = urj_part_data_register_define(p, "USER2_REG", 74);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG failed to add USER2_REG register !\n");
return -1;
}
// ER1 = 0x32 = 00110010b
if (urj_part_instruction_define(p, "USER2", "00110010", "USER2_REG") == NULL) {
fprintf(stderr, "JTAG failed to add USER2 instruction !\n");
return -1;
}
urj_part_set_instruction(p, "IDCODE");
urj_tap_chain_shift_instructions(jc);
urj_tap_chain_shift_data_registers(jc, 1);
id = urj_tap_register_get_value(p->active_instruction->data_register->out);
printf("Found device ID: 0x%08x\n", id);
urj_part_set_instruction(p, "USER2");
urj_tap_chain_shift_instructions(jc);

return 0;
}

static int jtag_reset(void)
{
return 0;
@ -339,8 +433,14 @@ static int jtag_command(uint8_t op, uint8_t addr, uint64_t *data)
return rc;
}

static struct backend jtag_backend = {
.init = jtag_init,
static struct backend bscane2_backend = {
.init = bscane2_init,
.reset = jtag_reset,
.command = jtag_command,
};

static struct backend ecp5_backend = {
.init = ecp5_init,
.reset = jtag_reset,
.command = jtag_command,
};
@ -662,7 +762,7 @@ static void ltrig_set(uint64_t addr)

static void usage(const char *cmd)
{
fprintf(stderr, "Usage: %s -b <jtag|sim> <command> <args>\n", cmd);
fprintf(stderr, "Usage: %s -b <jtag|ecp5|sim> <command> <args>\n", cmd);

fprintf(stderr, "\n");
fprintf(stderr, " CPU core:\n");
@ -706,7 +806,7 @@ int main(int argc, char *argv[])
{
const char *progname = argv[0];
const char *target = NULL;
int rc, i = 1;
int rc, i = 1, freq = 0;

b = NULL;

@ -717,9 +817,10 @@ int main(int argc, char *argv[])
{ "backend", required_argument, 0, 'b' },
{ "target", required_argument, 0, 't' },
{ "debug", no_argument, 0, 'd' },
{ "frequency", no_argument, 0, 's' },
{ 0, 0, 0, 0 }
};
c = getopt_long(argc, argv, "dhb:t:", lopts, &oindex);
c = getopt_long(argc, argv, "dhb:t:s:", lopts, &oindex);
if (c < 0)
break;
switch(c) {
@ -729,8 +830,10 @@ int main(int argc, char *argv[])
case 'b':
if (strcmp(optarg, "sim") == 0)
b = &sim_backend;
else if (strcmp(optarg, "jtag") == 0)
b = &jtag_backend;
else if (strcmp(optarg, "jtag") == 0 || strcmp(optarg, "bscane2") == 0)
b = &bscane2_backend;
else if (strcmp(optarg, "ecp5") == 0)
b = &ecp5_backend;
else {
fprintf(stderr, "Unknown backend %s\n", optarg);
exit(1);
@ -739,15 +842,22 @@ int main(int argc, char *argv[])
case 't':
target = optarg;
break;
case 's':
freq = atoi(optarg);
if (freq == 0) {
fprintf(stderr, "Bad frequency %s\n", optarg);
exit(1);
}
break;
case 'd':
debug = true;
}
}

if (b == NULL)
b = &jtag_backend;
b = &bscane2_backend;

rc = b->init(target);
rc = b->init(target, freq);
if (rc < 0)
exit(1);
for (i = optind; i < argc; i++) {
@ -789,7 +899,7 @@ int main(int argc, char *argv[])
if ((i+1) >= argc)
usage(argv[0]);
addr = strtoul(argv[++i], NULL, 16);
if (((i+1) < argc) && isdigit(argv[i+1][0]))
if (((i+1) < argc) && isxdigit(argv[i+1][0]))
count = strtoul(argv[++i], NULL, 16);
mem_read(addr, count);
} else if (strcmp(argv[i], "mw") == 0) {
@ -807,7 +917,7 @@ int main(int argc, char *argv[])
if ((i+1) >= argc)
usage(argv[0]);
filename = argv[++i];
if (((i+1) < argc) && isdigit(argv[i+1][0]))
if (((i+1) < argc) && isxdigit(argv[i+1][0]))
addr = strtoul(argv[++i], NULL, 16);
load(filename, addr);
} else if (strcmp(argv[i], "save") == 0) {

@ -84,8 +84,7 @@ entity soc is
DCACHE_TLB_NUM_WAYS : natural := 2;
HAS_SD_CARD : boolean := false;
HAS_GPIO : boolean := false;
NGPIO : natural := 32;
HAS_JTAG : boolean := false
NGPIO : natural := 32
);
port(
rst : in std_ulogic;
@ -131,13 +130,6 @@ entity soc is
gpio_dir : out std_ulogic_vector(NGPIO - 1 downto 0);
gpio_in : in std_ulogic_vector(NGPIO - 1 downto 0) := (others => '0');

-- JTAG signals
jtag_tck : in std_ulogic := '0';
jtag_tdi : in std_ulogic := '0';
jtag_tms : in std_ulogic := '0';
jtag_trst : in std_ulogic := '0';
jtag_tdo : out std_ulogic;

-- DRAM controller signals
alt_reset : in std_ulogic := '0'
);
@ -231,15 +223,15 @@ architecture behaviour of soc is
signal dmi_core_ack : std_ulogic;

-- Delayed/latched resets and alt_reset
signal rst_core : std_ulogic := '1';
signal rst_uart : std_ulogic := '1';
signal rst_xics : std_ulogic := '1';
signal rst_spi : std_ulogic := '1';
signal rst_gpio : std_ulogic := '1';
signal rst_bram : std_ulogic := '1';
signal rst_dtm : std_ulogic := '1';
signal rst_wbar : std_ulogic := '1';
signal rst_wbdb : std_ulogic := '1';
signal rst_core : std_ulogic;
signal rst_uart : std_ulogic;
signal rst_xics : std_ulogic;
signal rst_spi : std_ulogic;
signal rst_gpio : std_ulogic;
signal rst_bram : std_ulogic;
signal rst_dtm : std_ulogic;
signal rst_wbar : std_ulogic;
signal rst_wbdb : std_ulogic;
signal alt_reset_d : std_ulogic;

-- IO branch split:
@ -248,12 +240,20 @@ architecture behaviour of soc is
SLAVE_IO_ICP,
SLAVE_IO_ICS,
SLAVE_IO_UART1,
SLAVE_IO_SPI_FLASH_REG,
SLAVE_IO_SPI_FLASH_MAP,
SLAVE_IO_SPI_FLASH,
SLAVE_IO_GPIO,
SLAVE_IO_EXTERNAL,
SLAVE_IO_NONE);
signal slave_io_dbg : slave_io_type;
SLAVE_IO_EXTERNAL);
signal current_io_decode : slave_io_type;

signal io_cycle_none : std_ulogic;
signal io_cycle_syscon : std_ulogic;
signal io_cycle_uart : std_ulogic;
signal io_cycle_uart1 : std_ulogic;
signal io_cycle_icp : std_ulogic;
signal io_cycle_ics : std_ulogic;
signal io_cycle_spi_flash : std_ulogic;
signal io_cycle_gpio : std_ulogic;
signal io_cycle_external : std_ulogic;

function wishbone_widen_data(wb : wb_io_master_out) return wishbone_master_out is
variable wwb : wishbone_master_out;
@ -474,14 +474,20 @@ begin
-- Misc
variable has_top : boolean;
variable has_bot : boolean;
variable do_cyc : std_ulogic;
variable end_cyc : std_ulogic;
variable slave_io : slave_io_type;
variable match : std_ulogic_vector(31 downto 12);
begin
if rising_edge(system_clk) then
do_cyc := '0';
end_cyc := '0';
if (rst) then
state := IDLE;
wb_io_out.ack <= '0';
wb_io_out.stall <= '0';
wb_sio_out.cyc <= '0';
wb_sio_out.stb <= '0';
end_cyc := '1';
has_top := false;
has_bot := false;
else
@ -497,7 +503,7 @@ begin
wb_io_out.stall <= '1';

-- Start cycle downstream
wb_sio_out.cyc <= '1';
do_cyc := '1';
wb_sio_out.stb <= '1';

-- Copy write enable to IO out, copy address as well
@ -560,8 +566,8 @@ begin
-- Wait for new ack
state := WAIT_ACK_TOP;
else
-- We are done, ack up, clear cyc downstram
wb_sio_out.cyc <= '0';
-- We are done, ack up, clear cyc downstream
end_cyc := '1';

-- And ack & unstall upstream
wb_io_out.ack <= '1';
@ -585,7 +591,7 @@ begin
end if;

-- We are done, ack up, clear cyc downstram
wb_sio_out.cyc <= '0';
end_cyc := '1';

-- And ack & unstall upstream
wb_io_out.ack <= '1';
@ -596,144 +602,149 @@ begin
end if;
end case;
end if;

-- Create individual registered cycle signals for the wishbones
-- going to the various peripherals
if do_cyc = '1' or end_cyc = '1' then
io_cycle_none <= '0';
io_cycle_syscon <= '0';
io_cycle_uart <= '0';
io_cycle_uart1 <= '0';
io_cycle_icp <= '0';
io_cycle_ics <= '0';
io_cycle_spi_flash <= '0';
io_cycle_gpio <= '0';
io_cycle_external <= '0';
wb_sio_out.cyc <= '0';
wb_ext_is_dram_init <= '0';
wb_spiflash_is_map <= '0';
wb_spiflash_is_reg <= '0';
wb_ext_is_dram_csr <= '0';
wb_ext_is_eth <= '0';
wb_ext_is_sdcard <= '0';
end if;
if do_cyc = '1' then
-- Decode I/O address
-- This is real address bits 29 downto 12
match := "11" & wb_io_in.adr(26 downto 9);
slave_io := SLAVE_IO_SYSCON;
if std_match(match, x"FF---") and HAS_DRAM then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_dram_init <= '1';
elsif std_match(match, x"F----") then
slave_io := SLAVE_IO_SPI_FLASH;
io_cycle_spi_flash <= '1';
wb_spiflash_is_map <= '1';
elsif std_match(match, x"C8---") then
-- Ext IO "chip selects"
if std_match(match, x"--00-") and HAS_DRAM then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_dram_csr <= '1';
elsif (std_match(match, x"--02-") or std_match(match, x"--03-")) and
HAS_LITEETH then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_eth <= '1';
elsif std_match(match, x"--04-") and HAS_SD_CARD then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_sdcard <= '1';
else
io_cycle_none <= '1';
end if;
elsif std_match(match, x"C0000") then
slave_io := SLAVE_IO_SYSCON;
io_cycle_syscon <= '1';
elsif std_match(match, x"C0002") then
slave_io := SLAVE_IO_UART;
io_cycle_uart <= '1';
elsif std_match(match, x"C0003") then
slave_io := SLAVE_IO_UART1;
io_cycle_uart1 <= '1';
elsif std_match(match, x"C0004") then
slave_io := SLAVE_IO_ICP;
io_cycle_icp <= '1';
elsif std_match(match, x"C0005") then
slave_io := SLAVE_IO_ICS;
io_cycle_ics <= '1';
elsif std_match(match, x"C0006") then
slave_io := SLAVE_IO_SPI_FLASH;
io_cycle_spi_flash <= '1';
wb_spiflash_is_reg <= '1';
elsif std_match(match, x"C0007") then
slave_io := SLAVE_IO_GPIO;
io_cycle_gpio <= '1';
else
io_cycle_none <= '1';
end if;
current_io_decode <= slave_io;
wb_sio_out.cyc <= '1';
end if;
end if;
end process;
-- IO wishbone slave intercon.
-- IO wishbone slave interconnect.
--
slave_io_intercon: process(wb_sio_out, wb_syscon_out, wb_uart0_out, wb_uart1_out,
wb_ext_io_out, wb_xics_icp_out, wb_xics_ics_out,
wb_spiflash_out)
variable slave_io : slave_io_type;

variable match : std_ulogic_vector(31 downto 12);
variable ext_valid : boolean;
slave_io_intercon: process(all)
begin

-- Simple address decoder.
slave_io := SLAVE_IO_NONE;
match := "11" & wb_sio_out.adr(27 downto 10);
if std_match(match, x"FF---") and HAS_DRAM then
slave_io := SLAVE_IO_EXTERNAL;
elsif std_match(match, x"F----") then
slave_io := SLAVE_IO_SPI_FLASH_MAP;
elsif std_match(match, x"C0000") then
slave_io := SLAVE_IO_SYSCON;
elsif std_match(match, x"C0002") then
slave_io := SLAVE_IO_UART;
elsif std_match(match, x"C0003") then
slave_io := SLAVE_IO_UART1;
elsif std_match(match, x"C8---") then
slave_io := SLAVE_IO_EXTERNAL;
elsif std_match(match, x"C0004") then
slave_io := SLAVE_IO_ICP;
elsif std_match(match, x"C0005") then
slave_io := SLAVE_IO_ICS;
elsif std_match(match, x"C0006") then
slave_io := SLAVE_IO_SPI_FLASH_REG;
elsif std_match(match, x"C0007") then
slave_io := SLAVE_IO_GPIO;
end if;
slave_io_dbg <= slave_io;
wb_uart0_in <= wb_sio_out;
wb_uart0_in.cyc <= '0';
wb_uart0_in.cyc <= io_cycle_uart;
wb_uart1_in <= wb_sio_out;
wb_uart1_in.cyc <= '0';
wb_uart1_in.cyc <= io_cycle_uart1;

wb_spiflash_in <= wb_sio_out;
wb_spiflash_in.cyc <= '0';
wb_spiflash_is_reg <= '0';
wb_spiflash_is_map <= '0';
wb_spiflash_in.cyc <= io_cycle_spi_flash;
-- Clear top bits so they don't make their way to the
-- flash chip.
wb_spiflash_in.adr(27 downto 26) <= "00";

wb_gpio_in <= wb_sio_out;
wb_gpio_in.cyc <= '0';
wb_gpio_in.cyc <= io_cycle_gpio;

-- Only give xics 8 bits of wb addr (for now...)
wb_xics_icp_in <= wb_sio_out;
wb_xics_icp_in.adr <= (others => '0');
wb_xics_icp_in.adr(5 downto 0) <= wb_sio_out.adr(5 downto 0);
wb_xics_icp_in.cyc <= '0';
wb_xics_icp_in.cyc <= io_cycle_icp;
wb_xics_ics_in <= wb_sio_out;
wb_xics_ics_in.adr <= (others => '0');
wb_xics_ics_in.adr(9 downto 0) <= wb_sio_out.adr(9 downto 0);
wb_xics_ics_in.cyc <= '0';
wb_xics_ics_in.cyc <= io_cycle_ics;

wb_ext_io_in <= wb_sio_out;
wb_ext_io_in.cyc <= '0';
wb_ext_io_in.cyc <= io_cycle_external;

wb_syscon_in <= wb_sio_out;
wb_syscon_in.cyc <= '0';

wb_ext_is_dram_csr <= '0';
wb_ext_is_dram_init <= '0';
wb_ext_is_eth <= '0';
wb_ext_is_sdcard <= '0';

-- Default response, ack & return all 1's
wb_sio_in.dat <= (others => '1');
wb_sio_in.ack <= wb_sio_out.stb and wb_sio_out.cyc;
wb_sio_in.stall <= '0';
wb_syscon_in.cyc <= io_cycle_syscon;

case slave_io is
case current_io_decode is
when SLAVE_IO_EXTERNAL =>
-- Ext IO "chip selects"
--
-- DRAM init is special at 0xFF* so we just test the top
-- bit. Everything else is at 0xC8* so we test only bits
-- 23 downto 16 (21 downto 14 in the wishbone addr).
--
ext_valid := false;
if wb_sio_out.adr(27) = '1' and HAS_DRAM then -- DRAM init is special
wb_ext_is_dram_init <= '1';
ext_valid := true;
elsif wb_sio_out.adr(21 downto 14) = x"00" and HAS_DRAM then
wb_ext_is_dram_csr <= '1';
ext_valid := true;
elsif wb_sio_out.adr(21 downto 14) = x"02" and HAS_LITEETH then
wb_ext_is_eth <= '1';
ext_valid := true;
elsif wb_sio_out.adr(21 downto 14) = x"03" and HAS_LITEETH then
wb_ext_is_eth <= '1';
ext_valid := true;
elsif wb_sio_out.adr(21 downto 14) = x"04" and HAS_SD_CARD then
wb_ext_is_sdcard <= '1';
ext_valid := true;
end if;
if ext_valid then
wb_ext_io_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_ext_io_out;
end if;

wb_sio_in <= wb_ext_io_out;
when SLAVE_IO_SYSCON =>
wb_syscon_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_syscon_out;
when SLAVE_IO_UART =>
wb_uart0_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_uart0_out;
when SLAVE_IO_ICP =>
wb_xics_icp_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_xics_icp_out;
when SLAVE_IO_ICS =>
wb_xics_ics_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_xics_ics_out;
when SLAVE_IO_UART1 =>
wb_uart1_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_uart1_out;
when SLAVE_IO_SPI_FLASH_MAP =>
-- Clear top bits so they don't make their way to the
-- fash chip.
wb_spiflash_in.adr(27 downto 26) <= "00";
wb_spiflash_in.cyc <= wb_sio_out.cyc;
when SLAVE_IO_SPI_FLASH =>
wb_sio_in <= wb_spiflash_out;
wb_spiflash_is_map <= '1';
when SLAVE_IO_SPI_FLASH_REG =>
wb_spiflash_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_spiflash_out;
wb_spiflash_is_reg <= '1';
when SLAVE_IO_GPIO =>
wb_gpio_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_gpio_out;
when others =>
end case;

-- Default response, ack & return all 1's
if io_cycle_none = '1' then
wb_sio_in.dat <= (others => '1');
wb_sio_in.ack <= wb_sio_out.stb and wb_sio_out.cyc;
wb_sio_in.stall <= '0';
end if;

end process;

-- Syscon slave
@ -975,46 +986,21 @@ begin
end generate;

-- DMI(debug bus) <-> JTAG bridge
dmi_jtag: if HAS_JTAG generate
dtm: entity work.dmi_dtm_jtag
generic map(
ABITS => 8,
DBITS => 64
)
port map(
sys_clk => system_clk,
sys_reset => rst_dtm,
dmi_addr => dmi_addr,
dmi_din => dmi_din,
dmi_dout => dmi_dout,
dmi_req => dmi_req,
dmi_wr => dmi_wr,
dmi_ack => dmi_ack,
jtag_tck => jtag_tck,
jtag_tdi => jtag_tdi,
jtag_tms => jtag_tms,
jtag_trst => jtag_trst,
jtag_tdo => jtag_tdo
);
end generate;

dmi_xilinx: if not HAS_JTAG generate
dtm: entity work.dmi_dtm
generic map(
ABITS => 8,
DBITS => 64
)
port map(
sys_clk => system_clk,
sys_reset => rst_dtm,
dmi_addr => dmi_addr,
dmi_din => dmi_din,
dmi_dout => dmi_dout,
dmi_req => dmi_req,
dmi_wr => dmi_wr,
dmi_ack => dmi_ack
);
end generate;
dtm: entity work.dmi_dtm
generic map(
ABITS => 8,
DBITS => 64
)
port map(
sys_clk => system_clk,
sys_reset => rst_dtm,
dmi_addr => dmi_addr,
dmi_din => dmi_din,
dmi_dout => dmi_dout,
dmi_req => dmi_req,
dmi_wr => dmi_wr,
dmi_ack => dmi_ack
);

-- DMI interconnect
dmi_intercon: process(dmi_addr, dmi_req,

@ -50,7 +50,7 @@ architecture rtl of spi_flash_ctrl is
constant SPI_REG_INVALID : std_ulogic_vector(SPI_REG_BITS-1 downto 0) := "111";

-- Control register
signal ctrl_reg : std_ulogic_vector(15 downto 0) := (others => '0');
signal ctrl_reg : std_ulogic_vector(15 downto 0);
alias ctrl_reset : std_ulogic is ctrl_reg(0);
alias ctrl_cs : std_ulogic is ctrl_reg(1);
alias ctrl_rsrv1 : std_ulogic is ctrl_reg(2);
@ -58,7 +58,7 @@ architecture rtl of spi_flash_ctrl is
alias ctrl_div : std_ulogic_vector(7 downto 0) is ctrl_reg(15 downto 8);

-- Auto mode config register
signal auto_cfg_reg : std_ulogic_vector(29 downto 0) := (others => '0');
signal auto_cfg_reg : std_ulogic_vector(29 downto 0);
alias auto_cfg_cmd : std_ulogic_vector(7 downto 0) is auto_cfg_reg(7 downto 0);
alias auto_cfg_dummies : std_ulogic_vector(2 downto 0) is auto_cfg_reg(10 downto 8);
alias auto_cfg_mode : std_ulogic_vector(1 downto 0) is auto_cfg_reg(12 downto 11);
@ -126,9 +126,9 @@ architecture rtl of spi_flash_ctrl is
signal auto_latch_adr : std_ulogic;

-- Automatic mode latches
signal auto_data : std_ulogic_vector(wb_out.dat'left downto 0) := (others => '0');
signal auto_cnt : integer range 0 to 63 := 0;
signal auto_state : auto_state_t := AUTO_BOOT;
signal auto_data : std_ulogic_vector(wb_out.dat'left downto 0);
signal auto_cnt : integer range 0 to 63;
signal auto_state : auto_state_t;
signal auto_last_addr : std_ulogic_vector(31 downto 0);

begin
@ -351,6 +351,8 @@ begin
if rst = '1' then
auto_last_addr <= (others => '0');
auto_state <= AUTO_BOOT;
auto_cnt <= 0;
auto_data <= (others => '0');
else
auto_state <= auto_next;
auto_cnt <= auto_cnt_next;

@ -126,10 +126,10 @@ architecture rtl of spi_rxtx is
signal dat_ack_l : std_ulogic;

-- Delayed recv signal for the read machine
signal sck_recv_d : std_ulogic := '0';
signal sck_recv_d : std_ulogic;

-- Input shift register (use fifo ?)
signal ireg : std_ulogic_vector(7 downto 0) := (others => '0');
signal ireg : std_ulogic_vector(7 downto 0);

-- Bit counter
signal bit_count : std_ulogic_vector(2 downto 0);
@ -157,7 +157,7 @@ architecture rtl of spi_rxtx is
end;

type state_t is (STANDBY, DATA);
signal state : state_t := STANDBY;
signal state : state_t;
begin

-- We don't support multiple data lines at this point
@ -349,6 +349,9 @@ begin
shift_in: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
ireg <= (others => '0');
end if;

-- Delay the receive signal to match the input latch
if state = DATA then

@ -1,196 +0,0 @@
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <poll.h>
#include <signal.h>
#include <fcntl.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#undef DEBUG

/* XXX Make that some parameter */
#define TCP_PORT 13245

static int fd = -1;
static int cfd = -1;

static void open_socket(void)
{
struct sockaddr_in addr;
int opt, rc, flags;

if (fd >= 0 || fd < -1)
return;

signal(SIGPIPE, SIG_IGN);
fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
fprintf(stderr, "Failed to open debug socket\r\n");
goto fail;
}

rc = 0;
flags = fcntl(fd, F_GETFL);
if (flags >= 0)
rc = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
if (flags < 0 || rc < 0) {
fprintf(stderr, "Failed to configure debug socket\r\n");
}

memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(TCP_PORT);
addr.sin_addr.s_addr = htonl(INADDR_ANY);
opt = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
rc = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
if (rc < 0) {
fprintf(stderr, "Failed to bind debug socket\r\n");
goto fail;
}
rc = listen(fd,1);
if (rc < 0) {
fprintf(stderr, "Failed to listen to debug socket\r\n");
goto fail;
}
#ifdef DEBUG
fprintf(stdout, "Debug socket ready\r\n");
#endif
return;
fail:
if (fd >= 0)
close(fd);
fd = -2;
}

static void check_connection(void)
{
struct sockaddr_in addr;
socklen_t addr_len = sizeof(addr);

cfd = accept(fd, (struct sockaddr *)&addr, &addr_len);
if (cfd < 0)
return;
#ifdef DEBUG
fprintf(stdout, "Debug client connected\r\n");
#endif
}

static bool read_one_byte(char *c)
{
struct pollfd fdset[1];
int rc;

if (fd == -1)
open_socket();
if (fd < 0)
return false;
if (cfd < 0)
check_connection();
if (cfd < 0)
return false;

memset(fdset, 0, sizeof(fdset));
fdset[0].fd = cfd;
fdset[0].events = POLLIN;
rc = poll(fdset, 1, 0);
if (rc <= 0)
return false;
rc = read(cfd, c, 1);
if (rc != 1) {
#ifdef DEBUG
fprintf(stdout, "Debug read error, assuming client disconnected !\r\n");
#endif
close(cfd);
cfd = -1;
return false;
}

#ifdef DEBUG
fprintf(stdout, "Got message: %c\n", *c);
#endif

return true;
}

static void write_one_byte(char c)
{
int rc;

#ifdef DEBUG
fprintf(stdout, "Sending message: %c\r\n", c);
#endif

rc = write(cfd, &c, 1);
if (rc != 1) {
#ifdef DEBUG
fprintf(stdout, "JTAG write error, disconnecting\r\n");
#endif
close(cfd);
cfd = -1;
}
}

struct jtag_in {
uint8_t tck;
uint8_t tms;
uint8_t tdi;
uint8_t trst;
};

static struct jtag_in jtag_in;

struct jtag_in jtag_one_cycle(uint8_t tdo)
{
char c;

if (read_one_byte(&c) == false)
goto out;

// Write request
if ((c >= '0') && (c <= '7')) {
uint8_t val = c - '0';

jtag_in.tck = (val >> 2) & 1;
jtag_in.tms = (val >> 1) & 1;
jtag_in.tdi = (val >> 0) & 1;

goto out;
}

// Reset request
if ((c >= 'r') && (c <= 'u')) {
uint8_t val = c - 'r';

jtag_in.trst = (val >> 1) & 1;
}

switch (c) {
case 'B': // Blink on
case 'b': // Blink off
goto out;

case 'R': // Read request
write_one_byte(tdo + '0');
goto out;

case 'Q': // Quit request
#ifdef DEBUG
fprintf(stdout, "Disconnecting JTAG\r\n");
#endif
close(cfd);
cfd = -1;
goto out;

default:
fprintf(stderr, "Unknown JTAG command %c\r\n", c);
}

out:
return jtag_in;
}

@ -46,14 +46,6 @@ void tick(Vtoplevel *top)
void uart_tx(unsigned char tx);
unsigned char uart_rx(void);

struct jtag_in {
unsigned char tck;
unsigned char tms;
unsigned char tdi;
unsigned char trst;
};
struct jtag_in jtag_one_cycle(uint8_t tdo);

int main(int argc, char **argv)
{
Verilated::commandArgs(argc, argv);
@ -76,19 +68,10 @@ int main(int argc, char **argv)
top->ext_rst = 1;

while(!Verilated::gotFinish()) {
struct jtag_in p;

tick(top);

uart_tx(top->uart0_txd);
top->uart0_rxd = uart_rx();

p = jtag_one_cycle(top->jtag_tdo);

top->jtag_tck = p.tck;
top->jtag_tms = p.tms;
top->jtag_tdi = p.tdi;
top->jtag_trst = p.trst;
}

#if VM_TRACE

@ -24,7 +24,7 @@ entity wishbone_bram_wrapper is
end entity wishbone_bram_wrapper;

architecture behaviour of wishbone_bram_wrapper is
constant ram_addr_bits : integer := log2ceil(MEMORY_SIZE) - 3;
constant ram_addr_bits : integer := log2ceil(MEMORY_SIZE-1) - 3;

-- RAM interface
signal ram_addr : std_logic_vector(ram_addr_bits - 1 downto 0);

@ -54,9 +54,6 @@ architecture behaviour of xics_icp is

signal r, r_next : reg_internal_t;

-- hardwire the hardware IRQ priority
constant HW_PRIORITY : std_ulogic_vector(7 downto 0) := x"80";

-- 8 bit offsets for each presentation
constant XIRR_POLL : std_ulogic_vector(7 downto 0) := x"00";
constant XIRR : std_ulogic_vector(7 downto 0) := x"04";
@ -207,12 +204,14 @@ use ieee.numeric_std.all;

library work;
use work.common.all;
use work.utils.all;
use work.wishbone_types.all;
use work.helpers.all;

entity xics_ics is
generic (
SRC_NUM : integer range 1 to 256 := 16;
PRIO_BITS : integer range 1 to 8 := 8
PRIO_BITS : integer range 1 to 8 := 3
);
port (
clk : in std_logic;
@ -228,12 +227,16 @@ end xics_ics;

architecture rtl of xics_ics is

constant SRC_NUM_BITS : natural := log2(SRC_NUM);

subtype pri_t is std_ulogic_vector(PRIO_BITS-1 downto 0);
type xive_t is record
pri : pri_t;
end record;
constant pri_masked : pri_t := (others => '1');

subtype pri_vector_t is std_ulogic_vector(2**PRIO_BITS - 1 downto 0);

type xive_array_t is array(0 to SRC_NUM-1) of xive_t;
signal xives : xive_array_t;

@ -262,8 +265,15 @@ architecture rtl of xics_ics is
end function;

function prio_pack(pri8: std_ulogic_vector(7 downto 0)) return pri_t is
variable masked : std_ulogic_vector(7 downto 0);
begin
return pri8(PRIO_BITS-1 downto 0);
masked := x"00";
masked(PRIO_BITS - 1 downto 0) := (others => '1');
if unsigned(pri8) >= unsigned(masked) then
return pri_masked;
else
return pri8(PRIO_BITS-1 downto 0);
end if;
end function;

function prio_unpack(pri: pri_t) return std_ulogic_vector is
@ -276,8 +286,27 @@ architecture rtl of xics_ics is
r(PRIO_BITS-1 downto 0) := pri;
end if;
return r;
end function;
end function;

function prio_decode(pri: pri_t) return pri_vector_t is
variable v: pri_vector_t;
begin
v := (others => '0');
v(to_integer(unsigned(pri))) := '1';
return v;
end function;

-- Assumes nbits <= 6; v is 2^nbits wide
function priority_encoder(v: std_ulogic_vector; nbits: natural) return std_ulogic_vector is
variable h: std_ulogic_vector(2**nbits - 1 downto 0);
variable p: std_ulogic_vector(5 downto 0);
begin
-- Set the lowest-priority (highest-numbered) bit
h := v;
h(2**nbits - 1) := '1';
p := count_right_zeroes(h);
return p(nbits - 1 downto 0);
end function;

-- Register map
-- 0 : Config
@ -391,35 +420,33 @@ begin
end process;

irq_gen: process(all)
variable max_idx : integer range 0 to SRC_NUM-1;
variable max_idx : std_ulogic_vector(SRC_NUM_BITS - 1 downto 0);
variable max_pri : pri_t;

-- A more favored than b ?
function a_mf_b(a: pri_t; b: pri_t) return boolean is
variable a_i : unsigned(PRIO_BITS-1 downto 0);
variable b_i : unsigned(PRIO_BITS-1 downto 0);
begin
a_i := unsigned(a);
b_i := unsigned(b);
report "a_mf_b a=" & to_hstring(a) &
" b=" & to_hstring(b) &
" r=" & boolean'image(a < b);
return a_i < b_i;
end function;
variable pending_pri : pri_vector_t;
variable pending_at_pri : std_ulogic_vector(SRC_NUM - 1 downto 0);
begin
-- XXX FIXME: Use a tree
max_pri := pri_masked;
max_idx := 0;
-- Work out the most-favoured (lowest) priority of the pending interrupts
pending_pri := (others => '0');
for i in 0 to SRC_NUM - 1 loop
if int_level_l(i) = '1' and a_mf_b(xives(i).pri, max_pri) then
max_pri := xives(i).pri;
max_idx := i;
if int_level_l(i) = '1' then
pending_pri := pending_pri or prio_decode(xives(i).pri);
end if;
end loop;
max_pri := priority_encoder(pending_pri, PRIO_BITS);

-- Work out which interrupts are pending at that priority
pending_at_pri := (others => '0');
for i in 0 to SRC_NUM - 1 loop
if int_level_l(i) = '1' and xives(i).pri = max_pri then
pending_at_pri(i) := '1';
end if;
end loop;
max_idx := priority_encoder(pending_at_pri, SRC_NUM_BITS);

if max_pri /= pri_masked then
report "MFI: " & integer'image(max_idx) & " pri=" & to_hstring(prio_unpack(max_pri));
report "MFI: " & integer'image(to_integer(unsigned(max_idx))) & " pri=" & to_hstring(prio_unpack(max_pri));
end if;
icp_out_next.src <= std_ulogic_vector(to_unsigned(max_idx, 4));
icp_out_next.src <= max_idx;
icp_out_next.pri <= prio_unpack(max_pri);
end process;


Loading…
Cancel
Save