Work Group name, and Work Product track (both in second paragraph). -->
<abstract>
<!-- TODO: remove "phrase" tags (2) and text below and insert proper information -->
<para>The purpose of this document is to describe how to enable a new customer card on CAPI SNAP framework. SNAP is a open-sourced programming framework for FPGA Acclerations. Its homepage is <link xlink:href="https://github.com/open-power/snap">https://github.com/open-power/snap</link>. With it, you can develop accelerations with Power and CAPI technology easily.</para>
<para>The purpose of this document is to describe how to enable a new customer card to support CAPI SNAP framework. SNAP is a open-source programming framework for FPGA Accelerations. Its homepage is <link xlink:href="https://github.com/open-power/snap">https://github.com/open-power/snap</link>. With it, you can develop accelerators with CAPI technology easily.</para>
<para>This document describes the flow and steps to enable a new PCIe FPGA card to have or CAPI2.0 capability. Firstly, please check whether your PCIe FPGA card is listed on today's "SNAP enabled cards" (On the homepage README of <link xlink:href="https://github.com/open-power/snap">SNAP Github</link>), if not, this document will guide you on <emphasis role="bold">how to enable it</emphasis>. Since all of the project files are open-sourced, you can create a Github repository fork, and create a new board support package (BSP) and walk through the entire working flow to enable SNAP. </para>
<para>This document describes the flow and steps to enable a new PCIe FPGA card to be able to run in CAPI2.0 mode, and to support SNAP framework. If your PCIe FPGA card is not listed on today's available "SNAP enabled cards" (On the homepage README of <link xlink:href="https://github.com/open-power/snap">SNAP Github</link>), this document will guide you on <emphasis role="bold">how to enable it</emphasis>. Since all of the project files are open-source, you can create a Github repository fork, and create a new board support package (BSP) and walk through the working flow to enable SNAP. </para>
<para>This document is a Workgroup Note owned by the Acceleration Workgroup and handled in compliance with the requirements outlined in the
<para>This document is a Workgroup Note owned by the System Software Workgroup and handled in compliance with the requirements outlined in the
<citetitle>OpenPOWER Foundation Work Group (WG) Process</citetitle> document. It was
created using the <citetitle>Master Template Guide</citetitle> version &template_version;.
<para>Each card supplier may design their FPGA board with different FPGA chips, circuit components, memory and IOs, so BSP (Board support package) is different from card to card. That's why an open-sourced project is so helpful: It allows card supplier and every developer to explore the functions of the card freely, and get benefits from CAPI technology. </para>
<para> At least Flash interface pins and PCIe interface pins are required to be assigned in xdc files precisely.</para>
<para> Between capi_bsp_wrap and user AFU (psl_accel), there are 6 groups of signals: Command, DMA, Buffer, Response, MMIO and Control. Please refer to <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/v2-psl-afu-spec/content/ch_preface.html"> CAPI2.0 PSL/AFU interface Spec</link> for the details. </para>
<note><para>The logic in CAPI2.0 <emphasis role="bold">snap_core</emphasis> implemented the data path with DMA interface. Buffer interface is not used. </para></note>
</section>
</section>
<section><title>Step-by-step guidance</title>
<section><title>Work on github</title>
<para>capi2-bsp is a public Github repository. You need to have a Github account first. Then create a "fork" (Click the "fork" button) on <link xlink:href="https://github.com/open-power/capi2-bsp">https://github.com/open-power/capi2-bsp</link>.</para>
<note><para>Actually capi2-bsp is a submodule of snap. That will be introduced later.</para></note>
<para>Keep working on your own capi2-bsp fork, when it has been validated to work well, submit a pull request to "open-power/capi2-bsp" and require merging into the public upstream.</para>
<note><para>capi2-bsp is also a submodule of snap.</para></note>
<para>Keep working on your own capi2-bsp fork, when it has been validated to work well, submit a pull request to "open-power/capi2-bsp" and request merging into the public upstream.</para>
</section>
<section><title>Preparations</title>
<para>First, define a FPGACARD name. It can start from the company's name, followed with the card name and be short. For example. ADKU3 = Alpha-Data ADM-PCIE-KU3. Get information from the card supplier.</para>
<para>First, define a FPGACARD name. It can start from the company's name, following with the card name and be short. For example, ADKU3 = Alpha-Data ADM-PCIE-KU3. Get information from the card supplier.</para>
<important><para> Make sure the information in xdc/tcl files are permitted to be open-source. </para></important>
<para>There are some other modifications you should pay attention to:</para>
<orderedlist>
<listitem> <para>Send email to OpenPower Acceleration Workgroup or contact your representative to apply for a <emphasis role="bold">subsystem device ID</emphasis> for the new card. For example, ADKU3 uses 0x0605. S241 uses 0x0660. The information needs to be filled in "[FPGACARD]/tcl/create_ip.tcl", <userinput>CONFIG.PF0_SUBSYSTEM_ID</userinput></para>
</listitem>
<listitem> <para> As a CAPI device, you need to make sure PF0 (physical function) has <userinput>PF0_DEVICE_ID=0477</userinput> and <userinput>PF0_SUBSYSTEM_VENDOR_ID=1014</userinput>. This is required by the linux kernel module cxl <link xlink:href="https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/tree/drivers/misc/cxl/pci.c">(pci.c)</link>, otherwise the card will not be recognized as a CAPI card by system. </para>
<listitem><para><emphasis role="underline">PCIe core IP creation:</emphasis></para>
<itemizedlist>
<listitem><para>"Vendor ID" and "Device ID" have to be 0x1014 and 0x0477, so kernel module cxl can recognize the card as a CAPI device. (See in <link xlink:href="https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/tree/drivers/misc/cxl/pci.c">pci.c</link>)</para></listitem>
<listitem><para>If the card vendor has a code allocated by PCISIG (See in <link xlink:href="https://pcisig.com/membership/member-companies">PCISIG Member companies</link>), use it as "Subsystem Vendor ID". "Subsystem Device ID" can be chosen freely. </para></listitem>
<listitem><para>If the card vendor doesn't have a code allocated by PCISIG, or just for testing and evaluation purpose, please use default "Subsystem Vendor ID" = 0x1014, and send email to <email>aclwg-chair@openpowerfoundation.org</email> to get a distinct "Subsystem Device ID" to differentiate this card from others.</para>
<para>The corresponding "Subsytem Vendor ID" and "Subsystem Device ID" need to be added into <link xlink:href="https://github.com/ibm-capi/capi-utils">capi-utils</link>, file "psl-devices".</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>If you are using Xilinx VU33P or VU37P who have HBM, this is actually a new FPGA family "virtexuplushbm". Or if you are using other new FPGA Production family, additional steps need to take:</para>
<listitem><para><emphasis role="underline">Product Family support:</emphasis></para>
<para>If the FPGA chip types are Xilinx VU33P or VU37P who have HBM, this is actually a new FPGA family <userinput>virtexuplushbm</userinput>. For a new FPGA Production family, additional steps need to take:</para>
<itemizedlist>
<listitem><para>capi2-bsp/psl/create_ip.tcl: "set_property supported_families ...", add new family name like "virtexuplushbm Production"</para></listitem>
<listitem><para>capi2-bsp/common/tcl/create_capi_bsp.tcl: "set_property supported_families ...", do the same as above.</para></listitem>
<listitem><para>Add family support to PSL9 ZIP package: unzip the package, do the hacking, and zip them back again.</para>
<screen>unzip ibm.com_CAPI_PSL9_WRAP_2.00.zip
<listitem><para>"capi2-bsp/psl/create_ip.tcl": <userinput>set_property supported_families ...</userinput>, add new family name like "virtexuplushbm Production"</para></listitem>
<listitem><para>"capi2-bsp/common/tcl/create_capi_bsp.tcl": <userinput>set_property supported_families ...</userinput>, do the same as above.</para></listitem>
<listitem><para>Add family support to PSL9 ZIP package: unzip the package, do the modifications, and zip them back again. Commands:</para>
<screen>$ unzip ibm.com_CAPI_PSL9_WRAP_2.00.zip
(modify compnent.xml to add new family name, search "supportedFamilies")
zip -r ibm.com_CAPI_PSL9_WRAP_2.00.zip component.xml src/ xgui/
rm -fr component.xml src/ xgui/</screen>
$ zip -r ibm.com_CAPI_PSL9_WRAP_2.00.zip component.xml src/ xgui/
$ rm -fr component.xml src/ xgui/</screen>
</listitem>
</itemizedlist>
</listitem>
<listitem><para><emphasis role="bold">VSEC starting address: </emphasis> VSEC (Vendor Specific Extended Capability Structure) is a part of PCIe capability list architecture. It needs to be properly linked in PCIe config space. File "capi2-bsp/[FPGACARD]/src/capi_vsec.vhdl, vsec_addr[21:32] defines the address for VSEC. It should be matched with PCIe core value <emphasis role="bold">PF0_SECONDARY_PCIE_CAP_NEXTPTR</emphasis>. Take card U200 for example, its vsec_addr[21:32] starts from 12'h400 (12'b0100_0000_0000), and there is a patch_ip.tcl to modify the pcie4 core's default value 12'h480 to 12'h400. (Actually this is a historical workaround, because Xilinx had modified its extended capability starting address from 12'h400 to 12'h480 on one version. The patch is "tcl/patch_ip.tcl" </para>
<para>VSEC (Vendor Specific Extended Capability Structure) is a part of PCIe capability list architecture. It needs to be properly linked in PCIe config space. </para> <para>"capi2-bsp/[FPGACARD]/src/capi_vsec.vhdl": <userinput>vsec_addr[21:32]</userinput> defines the address for VSEC. It should be matched with PCIe core value <userinput>PF0_SECONDARY_PCIE_CAP_NEXTPTR</userinput>. Take card U200 for example, its <userinput>vsec_addr[21:32]</userinput> starts from 12'h400 (12'b0100_0000_0000), and "tcl/patch_ip.tcl" modifies it from default value 12'h480 to 12'h400." </para>
<para>About Xilinx PCIe code information for extended configuration space, you can find it on PG156 (for Ultrascale Device) or PG213 (for Ultrascale+ Device). </para>
<para>For Ultrascale+ HBM device, pcie4c core, the VSEC starts from 12'hE80. At this time vsec_addr must be changed in capi_vsec.vhdl. And the above two lines in patch_ip.tcl should be disabled.</para>
<para>Xilinx PCIe code information for extended configuration space can be found on <link xlink:href="https://www.xilinx.com/support/documentation/ip_documentation/pcie3_ultrascale/v4_4/pg156-ultrascale-pcie-gen3.pdf">PG156 (for Ultrascale Device)</link> or <link xlink:href="https://www.xilinx.com/support/documentation/ip_documentation/pcie4_uscale_plus/v1_3/pg213-pcie4-ultrascale-plus.pdf">PG213 (for Ultrascale+ Device)</link>.</para>
<para>For Ultrascale+ HBM device's pcie4c core, the VSEC starts from 12'hE80. At this time <userinput>vsec_addr[21:32]</userinput> must be changed in "capi_vsec.vhdl". And the above two lines in "patch_ip.tcl" are not needed anymore.</para>
</listitem>
<listitem>
<para><emphasis role="bold">Vital Product Data: </emphasis>Source files under "capi2-bsp/[FPGACARD]/src": capi_vsec.vhdl. <emphasis role="bold">This step is optional.</emphasis> Edit the hardcoded "vpd44data_be" to add VPD (Vital Product Data) information. Ideally this information should be read from an I2C EEPROM. The FPGA supplier wrote the content of EEPROM before shipping. However, today we take the simpliest way to write some hard coded value. "capi2-bsp/common/script" has a script "gen_vsec.sh" to help you do this. </para>
<para><emphasis role="underline">Vital Product Data: </emphasis><emphasis role="bold">This step is optional.</emphasis></para>
<para>"capi2-bsp/[FPGACARD]/src/capi_vsec.vhdl": Edit the hardcoded <userinput>vpd44data_be</userinput> to add VPD (Vital Product Data) information. Ideally this information should be read from an I2C EEPROM. The FPGA supplier wrote the content of EEPROM before shipping. However, today we take the simpliest way to write some hard coded value. "capi2-bsp/common/script" has a script "gen_vsec.sh" to do this. </para>
</listitem>
<listitem>
<para><emphasis role="bold">User Image Address: </emphasis>Source files under "capi2-bsp/[FPGACARD]/src": capi_xilmltbt.vhdl. Edit the User image starting address "wbstart_addr". </para>
<para>"capi2-bsp/[FPGACARD]/src/capi_xilmltbt.vhdl": Edit the User image starting address <userinput>wbstart_addr</userinput>. </para>
<screen>wbstart_addr <= "User_image_address" when (cpld_user_bs_req = '1') else "00000000000000000000000000000000";</screen>
<para>capi_xilmltbt.vhdl has a Xilinx multi-boot core. That means you can create two kinds of images: Factory image and User image. Factory images will be placed at address 0 of FPGA Flash, and User image will be placed at "User_image_address" on the flash. When power-on or the FPGA card is reset, the multiboot core knows where to load the image. Usually we put a Golden factory image on address 0 and never change it, and multiboot core always tries to load user image first. If the user image has something wrong, multiboot logic will tell the FPGA to "fallback" to factory image. You still see the card in the system and you can just program a new user image to try again. </para>
<para>"capi_xilmltbt.vhdl" has a Xilinx multi-boot core. That means you can create two kinds of images: Factory image and User image. Factory images will be placed at address 0 of FPGA Flash, and User image will be placed at "User_image_address" on the flash. When power-on or the FPGA card is reset, the multiboot core knows where to load the image. Usually we put a Golden factory image on address 0 and never change it, and multiboot core always tries to load user image first. If the user image has something wrong, multiboot logic will tell the FPGA to "fallback" to factory image. You still see the card in the system and you can just program a new user image to try again. </para>
</listitem>
<listitem>
<para> Check Vivado Version. Make sure this version of Vivado tool supports the FPGA part name you have assigned in "capi2-bsp/[FPGACARD]/Makefile". For some very new FPGA chip types, in one Vivado version they may have a suffix of "es" (engineering sample), and in a newer Vivado version the "es" suffix is removed.</para>
<para>Make sure this version of Vivado tool supports the FPGA part name you have assigned in "capi2-bsp/[FPGACARD]/Makefile". For some very new FPGA chip types, in one Vivado version they may have a suffix of "es" (engineering sample), and in a newer Vivado version the "es" suffix is removed.</para>
</listitem>
</orderedlist>
</section>
<section><title>Generate capi_bsp_wrap</title>
<screen>cd capi2-bsp
make [FPGACARD]</screen>
<para>If it is successfully done, the generation of BSP for CAPI2.0 is completed. For HDK developers, they can create their own Vivado project and import capi_bsp_wrap as an IP. But for SNAP developers, there are some other work to do, see in next chapter.</para>
<para>If it is successfully done, the generation of BSP for CAPI2.0 is completed. Developers using HDK mode can create their own Vivado project and import capi_bsp_wrap as an IP. But for SNAP developers there are some other work to do, see in next chapter.</para>
<para>Snap is also a public Github repository. Create a "fork" (Click the "fork" button) on <link xlink:href="https://github.com/open-power/snap">https://github.com/open-power/snap</link>. Keep working on your own snap fork, when it works, submit a pull request to "open-power/snap" and require merging into the public upstream.</para>
@ -34,14 +34,16 @@ git submodule update</screen>
<section><title>SNAP structure</title>
<para>On the FPGA side, there are three parts that need to consider when moving to a new FPGA card. They are (a) BSP, (b) snap_core, (c) DDR memory controller (mig). And there are also some components in SNAP need to be updated for a new FPGA card. </para>
<para> SNAP also includes the software part. The following picture shows the SNAP github repository folders and files: </para>
</figure>
<note><para>Module <userinput>snap_core</userinput> on CAPI2.0 implemented the data path with DMA interface. Buffer interface is not used. </para></note>
<para>The following picture shows the SNAP github repository folders and files.</para>
<figure pgwide="1" xml:id="snap-str">
<title>Repository structure</title>
<mediaobject>
@ -51,14 +53,14 @@ git submodule update</screen>
</mediaobject>
</figure>
<para>All of the user-developed accelerators are in "<emphasis role="bold">actions</emphasis>" directory. There are already some examples there. Each "action" has its "sw", "hw", "tests", and other sub-directories. The hardware part uses "action_wrapper" as its top.</para>
<para>Then back to ${SNAP_ROOT}, "<emphasis role="bold">software</emphasis>" directory includes libsnap, header files and some tools. "<emphasis role="bold">hardware</emphasis>" directory is the main focus. <emphasis role="bold">deconfig</emphasis> has the config files for silent testing purpose, and <emphasis role="bold">scripts</emphasis> has the menu settings and other scripts. </para>
<para>"<emphasis role="bold">software</emphasis>" directory includes libsnap, header files and some tools. "<emphasis role="bold">hardware</emphasis>" directory is the main focus. "<emphasis role="bold">deconfig</emphasis>" has the config files for silent testing purpose, and "<emphasis role="bold">scripts</emphasis>" has the menu settings and other scripts. </para>
<para>
How does SNAP work and what are the files used in each step?
</para>
<itemizedlist>
<listitem><para><emphasis role="bold">make snap_config</emphasis>: The menu to select cards and other options is controlled by "script/Kconfig"</para></listitem>
<listitem><para><userinput>make snap_config</userinput>: The menu to select cards and other options is controlled by "script/Kconfig"</para></listitem>
<listitem>
<para><emphasis role="bold">make model</emphasis>: This step creates a Vivado project. It firstly calls "hardware/setup/<emphasis role="bold">create_snap_ip.tcl</emphasis>" to generate the IP files in use, then calls "hardware/setup/<emphasis role="bold">create_framework.tcl</emphasis>" to build the project. About create_framework.tcl: </para>
<para><userinput>make model</userinput>: This step creates a Vivado project. It firstly calls "hardware/setup/create_snap_ip.tcl" to generate the IP files in use, then calls "hardware/setup/create_framework.tcl" to build the project. About "create_framework.tcl": </para>
<itemizedlist>
<listitem>
<para>It adds BSP (board support package). In CAPI1.0, it is also called PSL Checkpoint file (b_route_design.dcp) or base_image. It uses the path pointed to b_route_design.dcp and adds it into the design. In CAPI2.0, it will call the make process in capi2-bsp submodule to generate "capi_bsp_wrap" if it doesn't exist. If you have already successfully generated it, this step is skipped. Then "create_framework.tcl" adds the capi_bsp_wrap (xcix or xci file) into the design.</para>
@ -67,19 +69,19 @@ git submodule update</screen>
<para>It adds FPGA top files and snap_core files (in hardware/hdl/core).</para>
</listitem>
<listitem>
<para>It adds constrain files: in hardware/setup/${FPGACARD} or in hardware/capi2-bsp/${FPGACARD}</para>
<para>It adds constrain files: in "hardware/setup/[FPGACARD]" or in "hardware/capi2-bsp/[FPGACARD]"</para>
</listitem>
<listitem>
<para>It adds user files (in actions/${ACTION_NAME}/hw). User's action hardware uses top file named "action_wrapper.vhd"</para>
<para>It adds user files (in "actions/[ACTION_NAME]/hw"). User's action hardware uses top file named "action_wrapper.vhd"</para>
</listitem>
<listitem>
<para>It adds simulation files (in hardware/sim/core) including simulation top files and simulation models. (If "no_sim" is selected in snap_config menu, this step is skipped.)</para>
<para>It adds simulation files (in "hardware/sim/core") including simulation top files and simulation models. (If <userinput>no_sim</userinput> is selected in snap_config menu, this step is skipped.)</para>
</listitem>
</itemizedlist>
<para>After above steps, "<emphasis role="bold">viv_project</emphasis>" is created. You can open it with Vivado GUI, and check the design hierarchy. And it will call the selected simulator to compile the simulation model.</para>
<para>After above steps, "<emphasis role="bold">hardware/viv_project</emphasis>" is created. You can open it with Vivado GUI, and check the design hierarchy. And it will call the selected simulator to compile the simulation model.</para>
</listitem>
<listitem>
<para><emphasis role="bold">make image</emphasis>: This step runs synthesis, implementation and bitstream generation. It calls "hardware/setup/<emphasis role="bold">snap_build.tcl</emphasis>" and also uses some related tcl scripts to work on "viv_project". In this step, "hardware/build" will be created and the output products like bit images, checkpoints (middle products for debugging) and reports (reports of timing, clock, IO, utilization, etc.) If everything runs well and timing passes, user will get the bitstream files (in "<emphasis role="bold">build</emphasis>/Images" sub directory) to program the FPGA card. </para>
<para><userinput>make image</userinput>: This step runs synthesis, implementation and bitstream generation. It calls "hardware/setup/snap_build.tcl" and also uses some related tcl scripts to work together. In this step, "<emphasis role="bold">hardware/build</emphasis>" will be created and the output products like bit images, checkpoints (middle products for debugging) and reports (reports of timing, clock, IO, utilization, etc.) If everything runs well and timing passes, user will get the bitstream files (in "build/Images" sub directory) to program the FPGA card. </para>
</listitem>
</itemizedlist>
</section>
@ -96,14 +98,14 @@ git submodule update</screen>
<note>
<para>If you meet files ending with "_source", like "psl_fpga.vhd_source", that means this file will be pre-processed to generate the output file without "_source" suffix, like "psl_fpga.vhd". There are <userinput>#ifdef</userinput> macros or comments like <userinput>-- only for NVME_USED=TRUE</userinput>. They help to create a target VHDL/Verilog file with different configurations.</para>
</note>
<para>Below lists the files to change. There may be some differences with new commits in SNAP git repository. Keep in mind they include: </para>
<para>Below lists the files to change:</para>
<itemizedlist>
<listitem><para>snap_config and environmental files</para></listitem>
<listitem><para>Hardware: psl_accel and psl_fpga (top) RTL files</para></listitem>
<listitem><para>Hardware: tcl files for the workflow</para></listitem>
<listitem><para>Hardware: Board: xdc files for IO/floorplan/clock/bitstream</para></listitem>
<listitem><para>Hardware: DDR: create DDR Memory controller IP (mig) in create_snap_ip.tcl, create DDR memory sim model, and other xdc files</para></listitem>
<listitem><para>Hardware: Other IP: create_ip, sim model, xdc files</para></listitem>
<listitem><para>Hardware: xdc files for IO, floorplan, clock and bitstream settings</para></listitem>
<listitem><para>Hardware: create DDR Memory controller IP (mig) in create_snap_ip.tcl, create DDR memory sim model, and other xdc files</para></listitem>
<listitem><para>Hardware: create_ip, sim model and xdc files for other IPs</para></listitem>
<listitem><para>Software: New card type, register definition</para></listitem>
<listitem><para>Readme and Documents</para></listitem>
@ -244,8 +246,8 @@ git submodule update</screen>
<para>capi-utils is the third git repository that needs a few modifications. Same as before, fork it, make the modifications and submit a pull request.</para>
<para>The first column is the SUBSYSTEM_ID, the second column is the Card name, the third is the FPGA Chip Vendor, then it is the User Image starting address on the flash. For SPI device, size of block is 64Bytes. "SPIx4" is the flash interface type. It may also be "DPIx16" or "SPIx8". </para>
<para>It lists the Subsystem Vendor ID, Subsystem Device ID, Card name, FPGA chip, then it is the "User_image_address" on the flash. For SPI device, size of block is 64Bytes. "SPIx4" is the flash interface type. It may also be "DPIx16" or "SPIx8". </para>
<para>"SPIx8" uses two bitstreams so another starting address also needs to be provided. And when you call "capi-flash-script" to program the flash, it needs two input bitstream files (primary and secondary).</para>
<listitem><para>Generate capi_bsp_wrap in capi2-bsp.</para></listitem>
<listitem><para>Make modifications to snap git repository as described above.</para></listitem>
<listitem><para>Select an action example without DDR, for example: hls_helloworld. </para></listitem>
<listitem><para>Go through the "make model" and "make image" processes and build the bitstream files. </para></listitem>
<listitem><para>Go through the <userinput>make model</userinput> and <userinput>make image</userinput> processes and build the bitstream files. </para></listitem>
<listitem><para>Plug the card onto Power9 server and connect a JTAG/USB cable to a laptop. Install Vivado Lab on this laptop (it requires Windows or Linux operating system). Start Vivado Lab tool, open Hardware manager.</para></listitem>
<listitem><para>Power on the server. You will see the FPGA target is recognized by Vivado Lab tool.</para></listitem>
<listitem><para>Program the generated bitstream files (bin or mcs) to the card. On Vivado Lab tool, select the FPGA chip and right-click, choose "Add Configuration Memory Device..." and program the bin/mcs files to the flash. See in picture <xref linkend="vivado-lab"/> and <xref linkend="jtag"/> </para></listitem>
<listitem><para>Program the generated bitstream files (bin or mcs) to the card. On Vivado Lab tool, select the FPGA chip and right-click, choose "Add Configuration Memory Device..." and program the bin or mcs files to the flash. See in picture <xref linkend="vivado-lab"/> and <xref linkend="jtag"/> </para></listitem>
<listitem><para>Wait it done (It may take 10 minutes). Unplug the JTAG/USB cable, reboot the server.</para></listitem>
<listitem><para>After the server is booted, log into OS, run <userinput>lspci</userinput> to see if the card is there. (Usually with Device ID 0x0477). Then download snap, capi-utils, libcxl (from github). Go to snap directory, <userinput>make apps</userinput> and run the application. </para></listitem>
</orderedlist>
<note><para>There is another way to replace step 6 to 8. We call it <emphasis role="bold">"Fast program bit-file when power on"</emphasis>. Prepare the <emphasis role="bold">bit</emphasis> file on laptop in advance. Not like bin/mcs files which are for the flash, the bit file is used to program the FPGA chip directly. When the server is powered on, after Vivado Lad sees the FPGA, right click the device, <userinput>program device ...</userinput> and select the bit file <emphasis role="bold">immediately</emphasis>. This action only takes about 10 seconds and can be done before hostboot on the server starts to scan PCIe devices.</para>
<para>You should be aware of the fact that because only FPGA chip is programmed, (the flash memory is empty), when the server is powered off or reboot, FPGA doesn't have electricity so the programming in FPGA chip will be lost.</para>
<note><para>There is another way to replace step 6 to 8. We call it <emphasis role="underline">"Fast program bit-file when power on"</emphasis>. Prepare the <emphasis role="bold">bit</emphasis> file on laptop in advance. Not like bin/mcs files which are for the flash, the bit file is used to program the FPGA chip directly. When the server is powered on, after Vivado Lab sees the FPGA, right click the device, "program device..." and select the bit file <emphasis role="bold">immediately</emphasis>. This action only takes about 10 seconds and can be done before skiboot on the server starts to scan PCIe devices.</para>
<para>Be aware of the fact that now only FPGA chip is programmed, (the flash memory is still empty or holding old data), so when the server is powered off or reboot the recent programming to FPGA chip will be lost.</para>
<para>When you download and install <emphasis role="bold">Vivado Lab</emphasis>, please pick up as same version as the Vivado (SDx) that you are using to build images. </para>
<para>When you download and install <emphasis role="bold">Vivado Lab</emphasis>, please choose as same version as the Vivado tool that you were using to build images. </para>
</important>
<para> <emphasis role="bold">Tips to help you debug:</emphasis></para>
<para> <emphasis role="underline">Tips to help you debug:</emphasis></para>
<orderedlist>
<listitem><para>Seeing 0477 by "lspci" is the most important milestone. If not, please check file "<emphasis role="bold">/sys/firmware/opal/msglog</emphasis>" to see whether there are link training failed messages. A successful message looks like this, which means this PCIe device has been scanned and recognized. The number followed "PHB#" is the PCIe device identifier in the format of "domain:bus:slot.func". You can see it by "lspci" also.:</para>
<listitem><para>Seeing 0477 by <userinput>lspci</userinput> is the most important milestone. If not, please check file "<emphasis role="bold">/sys/firmware/opal/msglog</emphasis>" to see whether there are link training failed messages. A successful message looks like this, which means this PCIe device has been scanned and recognized. The number followed "PHB#" is the PCIe device identifier in the format of "domain:bus:slot.func":</para>
<listitem> <para> Check <emphasis role="bold">dmesg</emphasis>. Run "dmesg > dmesg.log" and search "cxl" in dmesg.log file. A normal output should be look like this</para>
<listitem> <para> Check <userinput>dmesg</userinput>. Run "dmesg > dmesg.log" and search "cxl" in "dmesg.log" file. A normal output should be look like this:</para>
<screen>[ 9.301403] cxl-pci 0000:01:00.0: Device uses a PSL9
<listitem><para>Today most of the linux kernel versions already include cxl module. If you doubt this, please check by</para>
<listitem><para>Today most of the linux kernel versions already include cxl module. You can doublecheck it by:</para>
<screen>modinfo cxl</screen>
</listitem>
<listitem> <para> Check create_ip.tcl in capi2-bsp/[FPGACARD]/tcl and check the configuration of PCIe core.</para>
<para>If your PCIe device has been recognized as CAPI, do "ls /dev/cxl" and you can see "afu*" devices. Then your application software can open the device like operating an ordinary file.</para>
<listitem>
<para>If your PCIe device has been recognized as CAPI, <prompt>ls /dev/cxl</prompt> and you can see "afu*" devices. Then your application software can open the device like operating an ordinary file.</para>
<screen>ls /dev/cxl
afu0.0m afu0.0s</screen>
<para>Some other useful commands to check PCIe config (with the right PCIe identifier "domain:bus:slot.func") </para>
<screen>sudo lspci -s 0000:01:00.0 -vvv</screen>
<para>For example, you can check the settings coded in Xilinx PCIe core, like SUBSYSTEM_ID:</para>
<para>For example, you can check the settings coded in Xilinx PCIe core, like Subsystem Device ID:</para>
<screen>0000:01:00.0 Processing accelerators: IBM Device 0477 (rev 02) (prog-if ff)
Subsystem: IBM Device 0660</screen>
<para>Link Speed</para>
@ -333,10 +335,10 @@ Kernel driver in use: cxl-pci
Kernel modules: cxl</screen>
</listitem>
<listitem>
<para>If nothing shows by "ls /dec/cxl", you should check PCIe config space by</para>
<para>If nothing shows by <userinput>ls /dev/cxl</userinput>, you should check PCIe config space:</para>
<para>Please change the PCIe device identifier (0000:00:00.1) accordingly. Make sure you have seen the VSEC is properly linked. If not, go back to check your VSEC address in capi_vsec.vhdl in last chapter.</para>
<para>Please pick up the PCIe device identifier (0000:00:00.1) you want to check. Make sure the VSEC is properly linked. If not, go back to check "capi_vsec.vhdl".</para>
<para>Use <link xlink:href="https://github.com/ibm-capi/capi-utils">capi-utils</link> to program the bitstream files. If it succeeds, it proves that the Flash interface has been configured correctly. After this step, you can get rid of JTAG connector and use "capi-flash-script" to program the FPGA bitstreams. </para>
<para>The mechanic behind "capi-flash-script" is: </para>
<para>There is a flash controller on FPGA (in capi_bsp_wrap), and it connects to PCIe config space. The flash controller exposes four VSEC registers to allow host system to control. They are "Flash Address Register", "Flash Size Register", "Flash Status/Control Register" and "Flash Data Port". See in <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/hwarch-caia-spec/content/ch_preface.html">Coherent Accelerator Interface Architecture</link>, Chapter 12.3, "CAIA Vendor-Specific Extended Capability Structure". So capi-utils src C file reads FPGA bitstream "bin" file, and writes the bytes to VSEC "Flash Data Port" register. So the bytes are sent to PCIe, through Flash controller and finally arrive to flash memory on the card.</para>
<para>There is a flash controller on FPGA (in capi_bsp_wrap), and it connects to PCIe config space. The flash controller exposes four VSEC registers to allow host system to control. They are: </para>
<para>The details are decribed in <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/hwarch-caia-spec/content/ch_preface.html">Coherent Accelerator Interface Architecture</link>, Chapter 12.3, "CAIA Vendor-Specific Extended Capability Structure". So the C file in <link xlink:href="https://github.com/ibm-capi/capi-utils">capi-utils</link> reads FPGA bitstream "bin" file, and writes the data to VSEC "Flash Data Port" register. So the bytes are sent through PCIe, to Flash controller and finally arrive to flash memory on the card.</para>
<listitem><para>Select another action example (hdl_example with DDR) or hls_memcopy.</para></listitem>
<listitem><para>"make model" and "make sim". Make sure the DDR simulation model works well.</para></listitem>
<listitem><para>"make image" to generate the bitstream files.</para></listitem>
<listitem><para><userinput>make model</userinput> and <userinput>make sim</userinput>. Make sure the DDR simulation model works well.</para></listitem>
<listitem><para><userinput>make image</userinput> to generate the bitstream files.</para></listitem>
<listitem><para>Use capi-utils to program the bitstream "bin" file to the card.</para></listitem>
<listitem><para>Run the application to see whether it works.</para></listitem>
</orderedlist>
<para>Basically SNAP only implemented 1 DDR Bank (or channel) while most cards have 2 to 4 banks. (N250S+ is one of the rare card which has only 1 DDR bank). The main reason was that depending on user's needs, there are two options: the first is to just extend the size of the first bank by adding this 2nd bank on the same DDR memory controller. The other option is to use 2 (or more) memory controllers in parallel to have a higher throughput. This later option means that you will need to duplicate the DDR memory controller in place and this will take twice the place in the design. In this case, the action_wrapper also requires change to add the additional DDR ports. For HLS design, another HLS DDR port should be added into "actions/[YOUR_ACTION]/hw/XXX.CPP". As for an opensource project, everyone is welcomed to add your contribution by implementing it and add it to the SNAP design.</para>
<para>Basically SNAP only implemented one DDR Bank (or channel) while most cards have two to four banks. (N250S+ is one of the rare card which has only one DDR bank). To implement more DDR channels, depending on user's needs, there are two options: the first is to just extend the size of the first bank by adding this second bank on the same DDR memory controller. The other option is to use two (or more) memory controllers in parallel to have a higher throughput. This later option means that you will need to duplicate the DDR memory controller in place and this will take twice the place in the design. In this case, the action_wrapper also requires change to add the additional DDR ports. For HLS design, another HLS DDR port should be added into "actions/[YOUR_ACTION]/hw/XXX.CPP". As for an opensource project, everyone is welcomed to add your contribution by implementing it and add it to the SNAP design.</para>
</section>
@ -410,9 +419,9 @@ Kernel modules: cxl</screen>
<section><title>Cleanup and submit</title>
<para>Now a new FPGA card has been enabled to CAPI2.0 SNAP. Cleanup your workspace, check files and submit your work!</para>
<itemizedlist>
<listitem><para>capi-utils is independent. Just create a pull request and assign a reviewer. It can only been merged into master branch after having been reviewed.</para></listitem>
<listitem><para>Submit the pull request of your "capi2-bsp fork" before "snap fork". Assign the reviewer and wait capi2-bsp to be merged into https://github.com/open-power/capi2-bsp master branch</para></listitem>
<listitem><para>Update the submodule pointer to the latest "open-power/capi2-bsp" master and then submit the pull request of your forked snap.</para></listitem>
<listitem><para>Capi-utils is independent. Just create a pull request and assign a reviewer. It can only been merged into master branch after having been reviewed.</para></listitem>
<para>A complete accelerator has software part (APP, or Applications) running on CPU Processor and the hardware part (AFU, Acceleration Function Unit) running on FPGA chip. APP and AFU are sharing host memory, that means, they both can read and write the 2^64 range of virtual memory address. To make it happen, CAPI technology has a CAPP (Coherent Acceleration Processor Proxy) logic unit in Processor chip, and also needs a PSL (Processor Service Layer) logic unit in FPGA chip. For CAPI1.0 and CAPI2.0, the interconnection between processor and FPGA is using PCIe physical links and PCIe form factor.</para>
<para> CAPI1.0 uses PCIe Gen3x8. </para>
<para> CAPI2.0 uses PCIe Gen4x8 or Gen3x16.</para>
<para> OpenCAPI is not covered in this document. Please check <link xlink:href="https://opencapi.org">https://opencapi.org</link> for more information. </para>
<para> OpenCAPI is not covered in this document. Visit <link xlink:href="https://opencapi.org">https://opencapi.org</link> for more information. </para>
</section>
<section> <title>Enable PSL IP on FPGA</title>
<para> Let's focus on the FPGA side.</para>
<para> A customer FPGA card needs to have a PSL module (Processor Service Interface) to become a "CAPI-enabled" card. This PSL module is provided by OpenPower Foundation and is an IBM IP. </para>
<para>This document only applies to the cards using Xilinx FPGA chips.</para>
<para>A customer FPGA card needs to have a PSL module (Processor Service Interface) to become a "CAPI-enabled" card. This PSL module is provided by OpenPower Foundation. </para>
<itemizedlist>
<listitem><para> For CAPI1.0, PSL module and the surrounding board specific modules are provided in the form of a routed dcp file (Xilinx Vivado design checkpoint). It's usually called b_route_design.dcp. </para></listitem>
<listitem><para> For CAPI2.0, PSL is an IP package with encrypted source code. It's named like ibm.com_CAPI_PSL9_WRAP_2.00.zip.</para></listitem>
</itemizedlist>
<para> They can be downloaded at <link xlink:href="https://www.ibm.com/systems/power/openpower">https://www.ibm.com/systems/power/openpower</link>. From the menu, select "CAPI","Coherent Accelerator Processor Interface (CAPI)" or directly click the "CAPI" icon to go to the CAPI section. Then download the appropriate files depending on your target system being POWER8 (CAPI 1.0) or POWER9 (CAPI 2.0). You need to register an IBM ID to download them.</para>
<para> For a new FPGA card, if you want to enable CAPI on it, it simply means to create a board supporting package which includes the PSL module onto the FPGA and let it work. There are two levels: HDK and SNAP. </para>
<section> <title>HDK</title>
<para>For HDK, a project from FPGA Vendors (i.e, a Xilinx Vivado project) which is composed of <emphasis>BSP</emphasis> (Board Supporting Package, containing PSL module) and sample user logic (AFU), is delivered to acceleration developers. This project is called HDK (Hardware Development Kit). </para>
<para>Users can develop CAPI accelerators in two modes: HDK and SNAP.</para>
<para>HDK is the abbreviation of Hardware Development Kit. As shown in the diagram below, on the FPGA side, you need a Xilinx Vivado project which includes two parts: BSP (Board Supporting Package, containing PSL module) and AFU (Acceleration Function Unit). How to generate BSP will be introduced in Chapter <xref linkend="chapter_capi20_bsp" endterm="chapter_capi20_bsp_title"/></para>
<para>The developers working on HDK level need to know the details about PSL interface specifications and write Verilog/VHDL logic to interact to it. Please refer to <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/psl-afu-spec/content/go01.html"> CAPI1.0 PSL Spec</link> and <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/v2-psl-afu-spec/content/ch_preface.html"> CAPI2.0 PSL Spec</link> or search "PSL/AFU interface" in your web browser. </para>
<para>As a full development environment, you also need SDK (Software Development Kit) which contains the example application software code and PSLSE (PSL Simulation Engine) for a software-hardware together simulation to guarantee the correctness of accelerator design. </para>
<para>HDK provides the maximum available FPGA resource area and the shortest latency. However, we recommend developers to work on SNAP because SNAP simplifies the developing work significantly. </para>
<para>AFU is where to implement user-defined functions. The developer working on AFU needs to understand the protocol between AFU and BSP, which is called PSL/AFU interface specification. Please refer to <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/psl-afu-spec/content/go01.html"> CAPI1.0 PSL Spec</link> and <link xlink:href="http://openpowerfoundation.org/wp-content/uploads/resources/v2-psl-afu-spec/content/ch_preface.html"> CAPI2.0 PSL Spec</link> or search "PSL/AFU interface" in your web browser. </para>
<para>When you develop an acceleration, you also need <link xlink:href="https://github.com/ibm-capi/pslse">PSLSE</link> (PSL Simulation Engine) for a software-hardware co-simulation to guarantee the correctness of accelerator design. </para>
<para>When you deploy the acceleration to OpenPower servers, it requires user library <link xlink:href="https://github.com/ibm-capi/libcxl">libcxl</link> and kernel module <userinput>cxl</userinput> to run the application. </para>
<para>In all, HDK mode will provide the maximum control, utilization of resources and shortest latency. However, SNAP mode simplifies and standardizes the application development significantly and is more recommended.</para>
</section>
<section> <title>SNAP</title>
<para>SNAP is the abbreviation of Storage, Networking and Analytics Programming. It is an open-source acceleration development framework <link xlink:href="https://github.com/open-power/snap"> https://github.com/open-power/snap</link>. On the FPGA side, SNAP framework adds a PSL/AXI bridge, a DDR SDRAM controller and an optional NVMe controller. Thus, the developer can focus on their acceleration kernel logic (here we call it hardware action) and interface the framework via several AXI ports. </para>
<figure pgwide="1" xml:id="snap1">
<para>SNAP is the abbreviation of Storage, Networking and Analytics Programming. It is an open-source acceleration development framework <link xlink:href="https://github.com/open-power/snap"> https://github.com/open-power/snap</link>. The benefits are: </para>
<orderedlist>
<listitem><para>On the FPGA side, SNAP framework adds a bridge to provide AXI interface to developers. So the developer can focus on acceleration function logic design, and doesn't need to study the details of PSL interface specification. AXI is the defacto industry standard for on-chip bus interconnections and is part of <link xlink:href="https://www.arm.com/products/silicon-ip-system/embedded-system-design/amba-specifications">AMBA (Advanced Microcontroller Bus Architecture)</link>.</para></listitem>
<listitem><para>It also provides DDR SDRAM controller and an optional NVMe controller. The developer can use the card memory or storage directly.</para></listitem>
<listitem><para>SNAP supports using HLS (High Level Synthesis) to develop the acceleration functional unit ("Hardware Action" in yellow box). Developers can write C++ functions and Vivado HLS will compile/convert them to Verilog or VHDL design automatically.</para></listitem>
<listitem><para>A new layer of user library "libsnap" provides more convenient APIs.</para></listitem>
<listitem><para>SNAP is an integrated developing environment that the developer can configure, create Vivado project, run co-simulation or build bitstream with simple commands.</para></listitem>
<listitem><para>Many action examples help new developers to get started.</para></listitem>
<para>Equipping the new FPGA card with SNAP framework needs a few additional steps and is introduced in Chapter <xref linkend="chapter_capi20_snap" endterm="chapter_capi20_snap_title"/></para>
<note><para>This document focuses on CAPI2.0. For CAPI1.0 enablement, please contact <email>capi-snap-doc@mailinglist.openpowerfoundation.org</email> for more information.</para>
<para>It is assumed the reader knows how to work on Vivado Project and SNAP already. You can find many materials on how to develop an accelerator with SNAP (Training videos, "docs" folder on <link xlink:href="https://github.com/open-power/snap"> snap github</link>, or other webpages) so they are not discussed in this document.</para></note>
<para>This document focus on CAPI2.0. For CAPI1.0 enablement, the BSP part is a little different, please contact an IBM representative for more information. The SNAP part is the same.</para>
<para>In following chapters, we introduce how to:</para>
<itemizedlist>
<listitem><para>Enable BSP </para></listitem>
<listitem><para>Enable SNAP </para></listitem>
</itemizedlist>
<para>We assume the reader knows how to work on Vivado Project and SNAP already. You can find many materials on how to develop an accelerator with SNAP (Training videos, "docs" folder on <link xlink:href="https://github.com/open-power/snap"> snap github</link>, or other webpages) so they are not discussed in this document.</para>