Supercomputer

HPE SGI8600

supercomputer

HPE SGI8600 is a shared supercomputer system that can be used by the various research and development departments in Japan Atomic Energy Agency and the National Institutes for Quantum and Radiological Science and Technology. The total theoretical peak performance of this supercomputer is 12.6 PFLOPS, with a total RAM capacity of 252 TB and HDD storage capacity of 17.6 PB.

This supercomputer is an integrated system comprising a GPGPU calculation unit, a CPU calculation unit, an ISV (Independent Software Vender) application processing unit, a login processing unit, a HDD storage unit, and a file backup unit. The units are connected by a 4x EDR InfiniBand network.

The GPGPU calculation unit in HPE SGI8600 is a blade-type large-scale cluster system with a total theoretical peak performance of 9.739 PFLOPS. It contains 4 racks, with 72 computing nodes installed in each rack, for a total of 272 nodes. Two Intel Xeon Gold 6248R (3.0 GHz, 24 core) are installed in one computing node, so the total number of cores is 13,056. In addition, 4 NVIDIA Tesla V100 SXM2 32 GB GPUs are installed on each node, so the total number of GPUs is 1088.
The RAM capacity is 384GB per compute node, giving a total RAM capacity of 102TB.

The CPU calculation unit in HPE SGI8600 is a blade-type large-scale cluster system with a total theoretical peak performance of 2.801 PFLOPS. It contains 10 racks, with 72 computing nodes installed in each rack, for a total of 706 nodes. Two Intel Xeon Gold 6242R (3.1 GHz, 20 core) are installed in one computing node, so the total number of cores is 28,240. The RAM capacity is 192 GB per compute node, giving a total RAM capacity of 132.375 TB.

Each compute node has 4 ports of 4x EDR InfiniBand interfaces and is connected by an InfiniBand switch blade in a single plane. Therefore, the theoretical transfer performance between arbitrary computing nodes is 50 GB/s (12.5 GB/s×4) in one direction. As a single InfiniBand network, it is mainly used for inter-node communication such as MPI and connecting with the HDD storage unit (Lustre file system). When large-capacity data communication is needed between nodes, it is possible to communicate with four times the bandwidth using four networks.