ConFlux is a cluster that seamlessly combines the computing power of HPC with the analytical power of data science. The next generation of computational physics requires HPC applications (running on external clusters) to interconnect with large data sets at run time. ConFlux provides low latency communications for in- and out- of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated work flow — rather than in segregated operations — leading to significantly lower latencies, fewer algorithmic barriers and less data movement.
The ConFlux cluster is built with ~58 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes providing 20 cores in each. Seventeen Power8 CPU two-socket “Garrison” S822LC compute nodes provide an additional 20 cores and host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus. Each GPU based node has a local high-speed NVMe flash memory for random access.
All compute and storage is connected via a 100 Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.
ConFlux is funded by a National Science Foundation grant; the Principal Investigator is Karthik Duraisamy, Assistant Professor of Aerospace Engineering and Director of the Center for Data-Driven Computational Physics (CDDCP). ConFlux and the CDDCP are under the auspices of the Michigan Institute for Computational Discovery and Engineering.