Optimization of the Brillouin operator on the KNL architecture
University of Wuppertal, Gaußstraße 20, D-42119 Wuppertal, Germany
2 IAS/JSC, Forschungszentrum Jülich GmbH, D-52425 Jülich, Germany
Published online: 26 March 2018
Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
© The Authors, published by EDP Sciences, 2018
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (http://creativecommons.org/licenses/by/4.0/).