https://doi.org/10.1051/epjconf/202430204001
Study on the Particle Sorting Performance for Reactor Monte Carlo Neutron Transport on Apple Unified Memory GPUs
New Compute Laboratory, No. 89, Jianguo Road, Beijing, China
* e-mail: changyuan_liu@163.com
Published online: 15 October 2024
In simulation of nuclear reactor physics using the Monte Carlo neutron transport method on GPUs, the sorting of particles plays a significant role in performance of calculation. Traditionally, CPUs and GPUs are separated devices connected at low data transfer rate and high data transfer latency. Emerging computing chips tend to integrate CPUs and GPUs. One example is the Apple silicon chips with unified memory. Such unified memory chips have opened doors for new strategies of collaboration between CPUs and GPUs for Monte Carlo neutron transport. Sorting particles on CPU and transport on GPU is an example of such new strategy, which has been suffering the high CPU-GPU data transfer latency on the traditional devices with separated CPU and GPU. The finding is that for the Apple M2 max and M3 max chip, sorting on CPU leads to better performance per power than sorting on GPU for the ExaSMR whole core benchmark problems and the HTR-10 high temperature gas reactor fuel pebble problem. The partially sorted particle order has been identified to contribute to the higher performance with CPU sort than GPU. The in-house code using both CPU and GPU achieves 7.6 times (M3 max) power efficiency that of OpenMC on CPU for ExaSMR whole core benchmark with depleted fuel, and 130 times (M3 max) for HTR-10 fuel pebble benchmark with depleted fuel.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.