
European nuclear physics lab CERN is ramping up operations at its new Hungarian remote-hosting site, which is designed to act as a datacentre extension for the Geneva hub.
CERN is in the process of deploying 26PB of disk storage to the two installations and testing out the effectiveness of its preparatory work in remotely running the Wigner colocation site in Budapest.
At the 3.5MW Geneva datacentre, CERN staff have been responsible for the installation and running of the hardware but at the Wigner site systems administration will be remotely handled.
“We’re trying to learn lessons from the recent past in that when we install hardware there can easily be errors in the installation and cabling process,” said Olof Bärring, leader of the CERN IT department‘s facility planning and procurement section.
“If you get the wrong network registration of a server, it can stop the whole process and you have to go in and correct that before you can continue the process of registering your servers and start the burn-in,” he said.
“This is something for which we are developing software but we want to make sure the instructions to the people doing this [in Hungary] are clear enough to minimise the number of mistakes they can make.”
Large Hadron Collider shutdown
In 2012, CERN generated 27PB of raw data from its four principal experiments and 3PB from other research programmes. About 99 percent of the data collected by the Large Hadron Collider detectors is thrown away. In total, about 100PB is archived, with 88PB on tape and 13PB on disk.
The Large Hadron Collider particle accelerator shut down in February for a two-year period of repair and upgrades.
“After the shutdown, the experiments will actually produce more data than they did in the past,” Bärring said.
“There was a lot of data from last year that they haven’t analysed, and when they come back in 2015 they’ll generate more data.”
The 27PB of storage that Bärring’s team is in the process of installing across the Geneva and Budapest sites – which are connected by two independent 100Gbps wide-area links – consists of 24-bay array SAS JBOD expansion units mounted with 3TB hard disks and LSI SAS 9205-8e host-bus adapters.
CERN has two storage-management systems. One is a mass-storage system, which is written at the Geneva datacentre with a tape backend for the custodial copy of the data. All the raw data collected, for example, from the Large Hadron Collider goes there, together with all the important intermediate data generated from that raw information.
The second system, which is also written at Geneva, is disk only and is used for high-turnover type of analysis.
“The phase when the scientists are analysing the data can involve a relatively random type of access and this is not very convenient for a sequential type of tape-based, mass-storage system. Therefore we have this additional system for serving that,” Bärring said.
“These two systems don’t talk directly to each other but files are sometimes copied between them when data analysis is going on.”
CERN’s power limitations
Because of the power restrictions at the Geneva datacentre, some of the 10PB to 20PB of disks that Bärring tends to buy per year is used to replace obsolete storage.
“Not because it’s not working but more that it’s inefficient from the electrical power point of view. So the watts per terabyte is an important metric for us because we’re very power-limited in our current centre here at CERN,” he said.
“We have been running more or less on the limit for a couple of years, which means whatever we buy for the centre we have to throw out some less power-efficient equipment.”
CERN organises its computing activities into tiers, with the World LHC computing grid’s tier 0 located at Geneva and now Budapest. Out of a total of 140 centres in 35 countries, there are 11 major tier 1 sites around the world connected to CERN via high-speed links, and smaller tier 2 and tier 3 centres linked via the internet.