Binning high-dimensional classifier output for HEP analyses through a clustering algorithm

Svenja Diekmann; Niclas Eich; Martin Erdmann

doi:10.1051/epjconf/202429506005

EPJ

a
b
c
d
e
ap
st
h
plus
ds
pv
ti
qt
am
n

Proceedings

Open Access

EPJ Web of Conferences 295, 06005 (2024)
https://doi.org/10.1051/epjconf/202429506005

Binning high-dimensional classifier output for HEP analyses through a clustering algorithm

Svenja Diekmann^*, Niclas Eich^** and Martin Erdmann on behalf of the CMS Collaboration

RWTH Aachen University

^* e-mail: svenja.diekmann@cern.ch
^** e-mail: niclas.steve.eich@cern.ch

Published online: 6 May 2024

Abstract

The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted. Moreover, in common statistical inference tools, the classification values need to be binned, which relies on the researcher’s expertise and is often nontrivial. To overcome the challenge of binning multiple dimensions and preserving the correlations of the event-related classification information, we perform K-means clustering on the high-dimensional DNN output to create bins without marginalising any axes. We evaluate our method in the context of a simulated cross section measurement at the CMS experiment, showing an increased expected sensitivity over the standard binning approach.

© The Authors, published by EDP Sciences, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.