ConFlux is a cluster that seamlessly combines the computing power of HPC with the analytical power of data science. The next generation of computational physics requires HPC applications (running on external clusters) to inter-connect with large data sets at run time. ConFlux provides low latency communications for in- and out- of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated work flow — rather than in segregated operations — leading to significantly lower latencies, fewer algorithmic barriers and less data movement.
The ConFlux cluster will be built with ~43 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes providing 20 cores in each, and fifteen Power8 CPU two-socket “Garrison” compute nodes providing an additional 20 cores each. Each of the Garrison nodes will also host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus. Each node has a local high-speed flash memory for random access.
All compute and storage is connected via a 100 Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology will provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.
ConFlux is funded by a National Science Foundation grant; the Principal Investigator is Karthik Duraisamy, Assistant Professor of Aerospace Engineering and Director of the Center for Data-Driven Computational Physics (CDDCP). ConFlux and the CDDCP are under the auspices of the Michigan Institute for Computational Discovery and Engineering.