Swiss research lab Cern explains how cloud computing could help tackle the petabytes of data generated by the Large Hadron Collider particle accelerator.
When your day job is figuring out the workings of the universe you need some heavy duty computing power at your disposal.
That's why researchers at Cern, the Swiss research lab that is home to the Large Hadron Collider (LHC) particle accelerator, are drafting in some additional muscle from the cloud.
The LHC experiments alone generate 22PB of data every year, which needs sifting and probing in Cern's quest to answer fundamental questions such as 'What is the origin of mass?'.
Cern already supplements its processing power via a network of 150 computing centres, known as the Worldwide LHC Computing Grid (WLCG), which store and analyses its research data. The WCLG puts some 150,000 processors at Cern's disposal but the research institute is examining whether it could double that number by turning to cloud computing.
The research institute is taking part in the Helix Nebula initiative, a pilot project designed to kick-start the European cloud computing industry by carrying out scientific research in the cloud.
Data from LHC experiments will be handled by various EU-based cloud providers over the next two years, as part of the project's goal of examining the role that cloud computing could play in EU research.
Bob Jones is Cern's head of openlab, the public-private partnership that helps Cern identify new information technologies that could benefit the lab. He said that Cern needs to find new sources of computing power, as demand will soon outstrip supply in its on-site computing facilities.
"We can consume as much computing capacity as we can get our hands on," he said.
"On the Cern site we can't increase the size of our datacentre much more. Two or three years down the line we're going to be limited by space and by electrical consumption.
"We have to think of what other options are open to us and the on-demand, elastic cloud computing provided by a number of these companies seems like a very good option for us to explore."
To test the suitability of cloud platforms for its research Cern is using them to run simulations of the Atlas experiment within the LHC. So far Cern has run the simulations using 1,000 virtual machines running in parallel on a platform provided by vendor Cloud Sigma. Jones said that Cern will trial various cloud platforms offered by providers taking part in the Helix Nebula project before deciding whether to shift larger data sets and more of its software stack to the cloud.
"We're working with one experiment at the moment, if this proves successful then we can expand it to all of our different software applications."
During the next two years Cern will scrutinise services provided by the Helix Nebula cloud vendors, comparing them to the cost of providing compute power and storage in-house, as well as considering issues such as scalability, data security and access.
"Depending on how well it performs we could flood these commercial services. Currently we are using 150,000 CPUs continuously [via the WCLG grid] and we could potentially double that," said Jones.
Other research agencies trialling cloud-based research as part of the EC-backed Helix Nebula project are the European Space Agency (ESA), which will carry out satellite-based research into natural disasters, and the European Molecular Biology Laboratory (EMBL), which will carry out genetic research.
Jones said that the organisations' combined appetite for data storage and processing could kick-start the cloud-market in Europe and provide a solid base of business for cloud vendors to grow from. About 13 technology vendors - including major suppliers like SAP and Atos Origin - have so far signed up to take part in Helix Nebula.
"We are trying to boot strap a European cloud market, we think that between us as large organisations we have enough critical mass to encourage these companies to come forward and form a cloud market in Europe," said Jones.
"If we can demonstrate that it is technically and financially feasible for world-leading research organisations like Cern, the ESA and EMBL to make use of these resources then that will attract others."
More importantly for Cern, Jones said, the additional computing power and storage promised by the cloud could help researchers analyse LHC data more rapidly.
"In the next few years we're going to have in excess of 20PB of data each year, which must be stored, processed and distributed to the thousands of physicists around the world.
"By giving them more compute capacity we give them more options as to what they can look for inside the physics data - they can do better data mining and analysis.
"We could potentially speed up the time to discovery by reducing the time it takes to analyse the data coming off accelerators."