The BIT1 and BIT3 codes are run routinely on supercomputers, made available through PRACE.
The scalability tests of BIT1 and BIT3 shown were carried out on the Marconi100 supercomputer
(IBM Power System AC922, IBM POWER9 16C 3GHz, NVIDIA Volta V100, Dual-rail Mellanox EDR Infiniband)
on up to 24,576 cores.
GENE is run on the largest supercomputers in the world, often exploiting all of the capabilities of a given platform. The weak scaling shown in the figure below has been created on Summit. One Summit node is equipped with two Power9 sockets with 21 cores each and 6 NVIDIA Volta V100 GPUs. Hence the CPU runs cover a range from 336 to 21,504 cores, the GPU runs from 48 GPUs to 3,072 GPUs. Strong scalings to > 262,000 cores (on Titan) have also been carried out successfully.
The largest run to date was carried out on Summit, using almost the entire machine: 27,600 Nvidia V100 GPUs, 10 trillion particles, 400 billion cells, 313 TByte of particle data and 14 TBytes of cell data, resulting in a single checkpoint size of 327 TByte. A single time step took 0.6 seconds. The scaling tests were performed on JUWELS BOOSTER at JSC, Jülich, using the NVIDIA A100 architecture with the PIConGPU SPEC benchmark and the Oak Ridge National Laboratory SUMMIT system using the NVIDIA V100 architecture.
The largest run to date was carried out on LUMI-C, using >185,000 cores. Typical production runs are performed on 32–64k cores. Strong scalings were performed on up to 200 nodes (25,600 cores) on Mahti (Bull Sequana XH2000) at CSC, Finland, using a realistic, 6D inhomogeneous global magnetospheric simulation setup. The presented figure shows super-ideal strong scaling as the large problem size used results in poor load balance, excessive ghost updates, and sub-optimal cache usage on small node counts.