FPGA HPC Systems

FPGA HPC Systems

Under an agreement with FORTH EXAPSYS will use, extend, optimize and support a novel FPGA-based HPC prototype consisting of 160 Xilinx Ultrascale+ MPSoC FPGAs, 640 64-bit Arm Cores, 2.5 TeraBytes of DRAM memory, and 10 TeraBytes of Solid-State Disk (SSD) storage.

System Architecture

The prototype HPC system employs a scalable programming environment and hardware architecture tailored to the characteristics and trends of current and future HPC applications, reducing significantly the data traffic as well as the energy consumption and delays. It follows a holistich approach providing a novel heterogeneous energy-efficient hierarchical architecture, a hybrid OpenCL programming environment and a runtime system. The architecture, programming model and runtime system follows a hierarchical approach where the system is partitioned into multiple autonomous Workers (i.e. compute nodes). Workers are interconnected in a tree-like structure in order to form larger Partitioned Global Address Space (PGAS) partitions, which are further hierarchically interconnected via an MPI protocol.

img

The prototype is seamlessly programmed using an extended version of OpenCL while MPI is hidden below the extended OpenCL runtime.
When running the High Performance Linpack (HPL) and the High-Performance Conjugate Gradient (HPCG) benchmarks on the ARM cores, the energy used is 6X to 10X less than that consumed by a high-end Intel CPU cluster.

When also utilizing the reconfigurable resources, the prototype is from 2.5 to 400 times faster and 46 to 600 times more energy efficient than conventional Intel CPU-based systems and up to 6 times faster and 10 more energy efficient than a parallel system utilizing high-end GPUs.

The measured GFLOPS per watt range from 9 to 20 while the chips are implemented in an old 16nm technology.