Performance Portability Strategies for Grid C++ Expression Templates

Peter A. Boyle; M.A. Clark; Carleton DeTar; Meifeng Lin; Verinder Rana; Alejandro Vaquero Avilés-Casco

doi:10.1051/epjconf/201817509006

EPJ

a
b
c
d
e
ap
st
h
plus
ds
pv
ti
qt
am
n

Proceedings

Open Access

EPJ Web of Conferences 175, 09006 (2018)
https://doi.org/10.1051/epjconf/201817509006

Performance Portability Strategies for Grid C++ Expression Templates

Peter A. Boyle¹, M.A. Clark², Carleton DeTar³, Meifeng Lin⁴^*, Verinder Rana⁴ and Alejandro Vaquero Avilés-Casco³

¹ Higgs Centre for Theoretical Physics, School of Physics & Astronomy, University of Edinburgh, EH9 3FD, UK
² NVIDIA Corporation, Santa Clara, CA 95050, USA
³ Department of Physics and Astronomy, University of Utah, Salt Lake City, UT 84112, USA
⁴ Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, USA

^* Speaker, e-mail: mlin@bnl.gov

Published online: 26 March 2018

Abstract

One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C₊₊ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.

As of Lattice 2017, our partners included University of Edinburgh, University of Illinois, NVIDIA and Stony Brook University.

© The Authors, published by EDP Sciences, 2018

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (http://creativecommons.org/licenses/by/4.0/).