Mid-December software patch exposed supercomputer storage bug, with some research lost for good
Kyoto University students using a Cray/HPE supercomputer for research lost 77 terabytes (TB) of data – much of it for good – after a software update exposed a fatal problem with the system’s storage. Hewlett Packard Enterprise (HPE) said it “took 100% responsibility” for the error.
HPE pushed a software update in mid-December intended to improve file visibility. But an issue with the shell script execution caused files to be deleted instead.
In a statement (in Japanese), Kyoto University confirmed the loss of approximately 77 TB of files from the LARGE0 file system. In all, the University said it lost about 34 million files. The wipeout affected 14 research groups. And four of those groups aren’t getting those lost files back, according to the school.
About 28 TB, or about 25 million files, can’t be recovered “due to the absence of backup.” A smaller number of actual research files is permanently gone. About 8 TB of actual research data is lost, according to the school.
The school has not identified which departments lost data permanently.
Script error deleted actual files, not log entries
HPE explained the problem in a PDF statement posted to the University site. An update intended to clean up old log files caused actual files to be deleted instead. The company “took 100% responsibility” for the error. It also said it would do its best to compensate those impacted.
Kyoto University said that it has stopped backups to the affected system until the end of January, to make sure this doesn’t happen again. The school announced a new incremental backup mitigation strategy, as well, and also offered one final suggestion.
“On the other hand, it is difficult to take complete measures including the possibility of file loss due to equipment failure or disaster, so even if you are a user, please back up important files to another system,” said the school (translated).
Kyoto University, one of the highest-ranked educational institutions in the world, conducts ongoing research in chemistry and physics. the school counts more than a dozen Nobel Prize-winning laureates among its researchers.
The Cray/HPE system is one of several supercomputers used for research by the university. In early 2021 Dell Technologies announced that 135 of its EMC PowerEdge servers act as the basis for Kyoto University’s Yukawa-21 supercomputer. Yukawa-21 lives at the Yukawa Institute for Theoretical Physics (YITP). Each PowerEdge server sports Intel Xeon processors, Nvidia v100 Tensor Core GPUs and high-speed interconnect by Dell.
HPE is central to another Japanese effort to modernize business and government using hybrid cloud technology. NTT’s Business Solutions unit, part of NTT West Group, is working with Microsoft Azure and Hewlett Packard Enterprise (HPE) to bring hybrid cloud to NTT’s Regional Revitalization Cloud. HPE’s GreenLake Infrastructure as a Service (IaaS) platform will serve as the edge-to-cloud platform.