Disaster recovery of the INFN Tier–1 data center: lesson learned
INFN-CNAF, v.le B.
Pichat 6/2 – 40100
* e-mail: firstname.lastname@example.org
Published online: 17 September 2019
The year 2017 was most likely a turning point for the INFN Tier- 1. In fact, on November 9th 2017 early at morning, a large pipe of the city aqueduct, located under the road next to CNAF, broke. As a consequence, a river of water and mud flowed towards the Tier-1 data center. The level of the water did not exceed the threshold of safety of the waterproof doors but, due to the porosity of the external walls and the floor, it could find a way into the data center. The flooding almost compromised all the activities and represented a serious threat to future of the Tier-1 itself. The most affected part of the data center was the electrical room, with all switchboards for both power lines and for the continuity systems, but the damages were diffused also to all the IT systems, including all the storage devices and the tape library. After a careful assessment of the damages, an intense recovery activity was launched, aimed not only to restore the services but also to secure data stored on disks and tapes. After nearly two months, in January, we were able to start to reopen gradually all the services, including part of the farm and the storage systems. The long tail of recovery (tapes recovery, second power line) has lasted until the end of May. As a short term consequence we have started a deep consolidation of the data center infrastructure to be able to cope also with this type of incidents; for the medium and long term we are working to move to a new, larger, location, able also to accommodate the foreseen increase of resources for HL-LHC.
© The Authors, published by EDP Sciences, 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.