Commit Graph

21 Commits (734e4c4a529152bfd6d8a1f2e93c520e34755191)

Author SHA1 Message Date
Paul Mackerras 734e4c4a52 core: Add a short multiplier
This adds an optional 16 bit x 16 bit signed multiplier and uses it
for multiply instructions that return the low 64 bits of the product
(mull[dw][o] and mulli, but not maddld) when the operands are both in
the range -2^15 .. 2^15 - 1.   The "short" 16-bit multiplier produces
its result combinatorially, so a multiply that uses it executes in one
cycle.  This improves the coremark result by about 4%, since coremark
does quite a lot of multiplies and they almost all have operands that
fit into 16 bits.

The presence of the short multiplier is controlled by a generic at the
execute1, SOC, core and top levels.  For now, it defaults to off for
all platforms, and can be enabled using the --has_short_mult flag to
fusesoc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Anton Blanchard 7cfbcd5514 litesdcard: Add Nexys Video support
This board has a reset line that needs to be held low to power up the
SD card hardware.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 458dfe01a6 Add liteeth support to Nexys Video
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Paul Mackerras 0fb207be60 fetch1: Implement a simple branch target cache
This implements a cache in fetch1, where each entry stores the address
of a simple branch instruction (b or bc) and the target of the branch.
When fetching sequentially, if the address being fetched matches the
cache entry, then fetching will be redirected to the branch target.
The cache has 1024 entries and is direct-mapped, i.e. indexed by bits
11..2 of the NIA.

The bus from execute1 now carries information about taken and
not-taken simple branches, which fetch1 uses to update the cache.
The cache entry is updated for both taken and not-taken branches, with
the valid bit being set if the branch was taken and cleared if the
branch was not taken.

If fetching is redirected to the branch target then that goes down the
pipe as a predicted-taken branch, and decode1 does not do any static
branch prediction.  If fetching is not redirected, then the next
instruction goes down the pipe as normal and decode1 does its static
branch prediction.

In order to make timing, the lookup of the cache is pipelined, so on
each cycle the cache entry for the current NIA + 8 is read.  This
means that after a redirect (from decode1 or execute1), only the third
and subsequent sequentially-fetched instructions will be able to be
predicted.

This improves the coremark value on the Arty A7-100 from about 180 to
about 190 (more than 5%).

The BTC is optional.  Builds for the Artix 7 35-T part have it off by
default because the extra ~1420 LUTs it takes mean that the design
doesn't fit on the Arty A7-35 board.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 45cd8f4fc3 core: Add support for floating-point loads and stores
This extends the register file so it can hold FPR values, and
implements the FP loads and stores that do not require conversion
between single and double precision.

We now have the FP, FE0 and FE1 bits in MSR.  FP loads and stores
cause a FP unavailable interrupt if MSR[FP] = 0.

The FPU facilities are optional and their presence is controlled by
the HAS_FPU generic passed down from the top-level board file.  It
defaults to true for all except the A7-35 boards.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Benjamin Herrenschmidt 02abb135a8 litedram: l2: Add support for more geometries
Make the DRAM data lines and user port width configurable, also
don't hard wire dependency on the wishbone data width.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt b0241d9f2d corefile/nexys_video: Parameter fixes
This fixes up a few issues with parameters:

Only arty has "has_uart1" since we haven't added plumbing for a second UART
anywhere else. Also "uart_is_16550" was mixing on one of the nexys_video
targets, and nexys_video toplevel was missing LOG_LENGTH.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt a5fa92f71b fpga: nexys-video: Wire up core_alt_reset
It looks like we left it dangling

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 5449d842dd nexys_video: Fix nexys-video build
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt fb5c16d05e uart: Make 16550 the default
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 7575b1e0c2 uart: Import and hook up opencore 16550 compatible UART
This imports via fusesoc a 16550 compatible (ie "standard") UART,
and wires it up optionally in the SoC instead of the potato one.

This also adds support for a second UART (which is always a
16550) to Arty, wired to JC "bottom" port.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt f9f18906a3 soc: Rename wb_dram_ctrl to wb_ext_io and rework decoding
This makes the control bus currently going out of "soc" towards
litedram more generic for external IO devices added by the
top-level rather than inside the SoC proper.

This is mostly renaming of signals and a small change on how the
address decoder operates, using a separate "cascaded" decode for
the external IOs.

We make the region 0xc8nn_nnnn be the "external IO" region for
now.

This will make it easier / cleaner to add more external devices.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 4244b54984 soc: Remove unused RESET_LOW generic
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt cc4dcb3597 spi: Add SPI Flash controller
This adds an SPI flash controller which supports direct
memory-mapped access to the flash along with a manual
mode to send commands.

The direct mode can be set via generic to default to single
wire or quad mode. The controller supports normal, dual and quad
accesses with configurable commands, clock divider, dummy clocks
etc...

The SPI clock can be an even divider of sys_clk starting at 2
(so max 50Mhz with our typical Arty designs).

A flash offset is carried via generics to syscon to tell SW about
which portion of the flash is reserved for the FPGA bitfile. There
is currently no plumbing to make the CPU reset past that address (TBD).

Note: Operating at 50Mhz has proven unreliable without adding some
delay to the sampling of the input data. I'm working in improving
this, in the meantime, I'm leaving the default set at 25 Mhz.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt a93d9e77c9 litedram: Remove remnants of riscv-inits
We still had some wires bringing an extra serial port out of
litedram for the built-in riscv processor. This is all gone now
so take them out.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt bf1b98b958 litedram: Add support for booting without BRAM
This adds an option to disable the main BRAM and instead copy a
payload stashed along with the init code in the secondary BRAM
into DRAM and boot from there

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 573b6b4bc4 soc: Rework interconnect
This changes the SoC interconnect such that the main 64-bit wishbone out
of the processor is first split between only 3 slaves (BRAM, DRAM and a
general "IO" bus) instead of all the slaves in the SoC.

The IO bus leg is then latched and down-converted to 32 bits data width,
before going through a second address decoder for the various IO devices.

This significantly reduces routing and timing pressure on the main bus,
allowing to get rid of frequent timing violations when synthetizing on
small'ish FPGAs such as the Artix-7 35T found on the original Arty board.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 7560e8f2ff arty/nexys: Rework reset with litedram
When using litedram, request a much longer PLL reset. This seems to
help get rid of all the grabled output after config.

Also use the clean system_rst out of litedram as our source of reset
for the rest of the SoC (it is synchronized with system_clk and takes
pll_locked into account already)
4 years ago
Benjamin Herrenschmidt c19b5b8cc7 litedram: Update to new LiteX/LiteDRAM version
Things have changed a bit in upstream LiteX. LiteDRAM now exposes a
wishbone for the CSRs for example.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 025cf5efe8 syscon: Add syscon registers
These provides some info about the SoC (though it's still somewhat
incomplete and needs more work, see comments).

There's also a control register for selecting DRAM vs. BRAM at 0
(and for soft-resetting the SoC but that isn't wired up yet).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago
Benjamin Herrenschmidt 2cef3005cd fpga: Hookup nexys-video to litedram
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
4 years ago