High performance reconfigurable computing with cellular automata

Murtaza, S.

Citation for published version (APA):

General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.
Appendix

Spartan-3 Board

The Spartan-3 starter kit board [118] from Xilinx has one XC3S200 FPGA, and two 512KB SRAM banks. The two SRAM banks are not independent as both of them are addressed using common pins from the FPGA. A high-level block diagram of the said board is shown in Figure 10.1. The Spartan-3 board runs with a maximum clock frequency of 50MHz and the available asynchronous SRAM chips on board have access time $\leq 10$ns. Some of the relevant components and features available on this board are as following:

- 200,000-gate Xilinx Spartan-3 XC3S200 FPGA
  - 4,320 logic cells
  - Twelve 18K-bit block RAMs(216K bits)
  - Twelve 18x18 hardware multipliers
  - Up to 173 user-defined I/O signals
- 1M-byte of Asynchronous SRAM
  - Two independent 256Kx16 10 ns SRAM arrays
  - Individual chip select per device
- RS-232 Serial Port, that uses straight-through serial cable to connect to computer serial port
- 50 MHz crystal oscillator clock source

ADMXRC-FX140 Board

The ADM-XRC-4FX PCI Mezzanine board [3] from Alfa-data [38] is a Xilinx Virtex-4 FX140 [119] [117] based PMC with four independent 256MB DDR2-SDRAM banks. Each DDR2 bank can be accessed independently from the FPGA (user logic) and via the PCI interface from the host machine. A high-level block diagram of the said board is shown in Figure 10.2. Some of the relevant components and features available on this board are as following:
Figure 10.1: Spartan-3 board: FPGA to SRAM connections. Image taken from [118]

- Virtex-4 FX140 FPGA
  - 142128 logic cells
  - 552 18K-bit block RAMs (9936K bits)
  - 192 DSP slices
  - 2 PowerPC processor blocks and other I/O blocks

- Four independent 64Mx32 DDR2 SDRAM (1GB total)
- High performance PCI and DMA controllers
- Programmable user clock between 31.25- to 625-MHz

The ADM-XRC-4FX board comes with a software development kit [4] including drivers, header and library files that support a C or C++ program running in the host machine to communicate directly with the board. Alpha-Data also provided an ADM-XRC-4FX Co-Processor Development Kit, used for FPGA-based HPC implementations. Additional code is provided for board initialization and selection, control of the programmable clocks and handling of FPGA configuration files.
Maxwell 64-FPGA Supercomputer

Maxwell is a 64-FPGA supercomputer [8] from EPCC at the University of Edinburgh, Scotland. Essentially it is an IBM BladeCentre cluster with FPGA acceleration. Each blade server includes one Intel Xeon dual-core processor and two FPGAs. The host CPU and attached FPGA are connected with a standard IBM PCI-X expansion module. The blades are connected over gigabit Ethernet through a switch with 40 Gb/s throughput. In addition to the standard gigabit Ethernet network for CPUs communication, an FPGA network is also provided. The FPGA network consists of point-to-point links between the MGT connectors of adjacent FPGAs, thus forming a two-dimensional torus (see [8] for model details). The idea of having two networks as [8] quotes, “use Ethernet network purely as a control network and to perform parallel communications over RocketIO”.

The FPGAs in Maxwell are Xilinx Virtex-4 devices but half of them are FX and rest LX type. FX type come on an Alpha-data [38] supplied PCI expansion card called ADM-XRC fitted with a Xilinx Virtex-4 FX100 FPGA and four independent DDR2 SDRAM banks making a total of 1GByte on-board memory. The other LX type come on a Nallatech [79] supplied card fitted with a Xilinx Virtex-4 LX160 FPGA, SRAM and SDRAM banks. Each vendor supplied FPGA-board comes with its own development kit that enable the communication between the host software process and the FPGA-board resources like memory banks and FPGA core. Secondly, a set of hardware library that provide the necessary interface between the user define logic and the FPGA-board resources like memory, various types of I/O devices.

Since our single FPGA-based D2Q9 LBM CA was implemented using ADM-XRC-FX140 an Alpha-data PCI card, therefore, porting to multiple FPGA system using a similar card (ADM-XRC FX100 PCI card) from Alfa-data made things comfortable. The ADM-XRC FX100 is similar to FX140 board as explained in Appendix 10 except the
FPGA chip itself. This boards comes with a Virtex-4 FX100 that has lower logic density as compared to the FX140 type. This difference in available logic also explains why we were able to implement more LBM PEs using FX140 FPGA (see Chapter 5 for details) as compared to FX100 FPGAs. Other than the FPGA chip rest of the available on-board resources like memory and DMA controllers are same for both boards [3]. Some of the relevant components and features available on ADM-XRC-FX100 board are as following:

- **FX100 FPGA**
  - 94896 logic cells
  - 376 18K-bit block RAMs (6768K bits)
  - 160 DSP slices
  - 2 PowerPC processor blocks and other I/O blocks
- **Four independent 64Mx32 DDR2 SDRAM (1GB total)**
- **High performance PCI and DMA controllers**
- **Programmable user clock between 31.25- to 625-MHz**

The FX100 based ADM-XRC-4FX board from Alpha-data comes with a software development kit [4] as explained in Appendix 10. Additionally, Parallel toolkit (PTK)- a high-level configuration and APIs are supplied by FPGA HPC Alliance [9] [45]. PTK comprises a library of C++ classes providing a) abstract interfaces to application components implemented using FPGA hardware, b) loading or configuring FPGA with bitstreams, and a standard way of launching parallel FPGA jobs. PTK and its API are explained in [9].