Scaling the EOS namespace – new developments, and performance optimizations
CERN, Esplanade des Particules 1,
1217 Meyrin, Geneva,
Published online: 17 September 2019
EOS is the distributed storage solution being developed and deployed at CERN with the primary goal of fulfilling the data needs of the LHC and its various experiments. Being in production since 2011, EOS currently manages around 256 petabytes of raw disk space and 3.4 billion files across several instances. Nowadays, EOS is increasingly being used as a distributed filesystem and file sharing platform, which poses scalability challenges on its legacy namespace subsystem, tasked with keeping track of all file and directory metadata on a particular instance.
In this paper we discuss said challenges, and present our solution which has recently entered production. We made several architectural improvements to the overall system design, the most important of which was introducing QuarkDB, a highly-available datastore capable of serving as the metadata backend for EOS, tailored to the needs of the namespace.
We also describe our efforts in providing comparable latency and performance to the legacy in-memory implementation, both when reading through the use of extensive caching and prefetching, and when writing through the use of latencyhiding techniques involving a persistent, back-pressured local queue for batching writes towards the QuarkDB backend.
© The Authors, published by EDP Sciences, 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.