A prototype for the evolution of ATLAS EventIndex based on Apache Kudu storage

Zbigniew Baranowski; Luca Canali; Alvaro Fernandez Casani; Elizabeth J Gallas; Carlos Garcia Montoro; Santiago González de la Hoz; Julius Hrivnac; Fedor Prokoshin; Grigori Rybkine; Jose Salt; Javier Sanchez; Dario Barberis

doi:10.1051/epjconf/201921404057

EPJ

a
b
c
d
e
ap
st
h
plus
ds
pv
ti
qt
am
n

Proceedings

Open Access

EPJ Web of Conferences 214, 04057 (2019)
https://doi.org/10.1051/epjconf/201921404057

A prototype for the evolution of ATLAS EventIndex based on Apache Kudu storage

Zbigniew Baranowski¹^*, Luca Canali¹^**, Alvaro Fernandez Casani²^***, Elizabeth J Gallas³^****, Carlos Garcia Montoro²^†, Santiago González de la Hoz²^‡, Julius Hrivnac⁴^§, Fedor Prokoshin⁵^¶, Grigori Rybkine⁴^‖, Jose Salt²^**, Javier Sanchez²^†† and Dario Barberis⁶^‡‡ on behalf of the ATLAS Collaboration

¹ CERN, Geneva, Switzerland
² Insitut de Fisica Corpuscular, Valencia Spain
³ University of Oxford, Denys Wilkinson Bldg, Keble Rd, Oxford OX1 3RH, United Kingdom
⁴ LAL, Université Paris-Sud and CNRS/IN2P3, Orsay, France
⁵ Universidad Tecnica Federico Santa Maria, Chile
⁶ Università di Genova and INFN, Genova, Italy

^* e-mail: zbigniew.baranowski@cern.ch
^** e-mail: luca.canali@cern.ch
^*** e-mail: Alvaro.Fernandez@ific.uv.es
^**** e-mail: elizabeth.gallas@physics.ox.ac.uk
^† e-mail: carlos.garcia@ific.uv.es
^‡ e-mail: sgonzale@ific.uv.es
^§ e-mail: Julius.Hrivnac@cern.ch
^¶ e-mail: Fedor.Prokoshin@cern.ch
^‖ e-mail: Grigori.Rybkine@cern.ch
^** e-mail: Jose.Salt@ific.uv.es
^†† e-mail: Javier.Sanchez@ific.uv.es
^‡‡ e-mail: Dario.Barberis@cern.ch

Published online: 17 September 2019

Abstract

The ATLAS EventIndex has been in operation since the beginning of LHC Run 2 in 2015. Like all software projects, its components have been constantly evolving and improving in performance. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and relational database features, which look promising at the design level. This environment is used to build a prototype to measure the scaling capabilities as functions of data input rates, total data volumes and data query and retrieval rates. In this proceedings we report on the selected data schemas and on the current performance measurements with the Kudu prototype.

© The Authors, published by EDP Sciences, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.