Over three decades video cards have transformed computer graphics from monochrome line drawings to near photo realistic renderings.

But the processing power of the GPU is increasingly being used to tame the huge sums of data generated by modern industry and science. And now a project to build the world’s largest telescope is considering using a GPU cluster to stitch together more than an exabyte of data each day.

In excess of an exabyte – more information than passes across the internet in 24 hours – is expected to be gathered by the Square Kilometre Array (SKA) every day following its completion in 2024. The SKA will be an array of 3,000 radio telescopes that will gather cosmic emissions in an attempt to see the universe a few hundreds of million years after the Big Bang – further back in time than any telescope has glimpsed.

SKA researchers believe a GPU-based supercomputer may be a good fit for piecing together the data from the array of dishes to allow it to function as a single radio telescope.

Such a task couldn’t be carried out by a CPU-based supercomputer with a conventional architecture because data typically flows into and out of CPUs many times slower than it does GPUs – too slowly to cope with the multiple data streams from the array’s telescopes.

Albert-Jan Boonstra, director of R&D at Astron, the Netherlands Institute for Radio Astronomy, said: “For the correlation, where the signals from all from the signals from the telescopes are being correlated, we need all the bandwidth we can get, so the I/O is a serious constraint.

“GPUs would be good candidate; the bottleneck was the I/O but it’s now coming to a region where it’s worth considering them for correlation and post-processing.”

As well as GPUs’ having the bandwidth to correlate the radio telescope signals, the boards typically pack hundreds of cores and are able to handle thousands of software threads simultaneously – significantly more than CPUs. The GPU’s power comes at the expense of flexibility; while CPUs are general purpose chips GPUs can only carry out a limited set of computing tasks.

But because GPU’s need to manipulate complex images on a screen in real time, they generally excel at repeatedly performing the same operation on large volumes of data. With a bit of clever coding that trait could be exploited to correlate data from 3,000 telescopes.

The brute force of GPUs when it comes to repetitive tasks is being tapped for tasks as diverse as financial modelling and research in oil and gas exploration. The UK’s largest GPU-based supercomputer, named Emerald, came into service at the Science and Technology Facilities Council’s Rutherford Appleton Laboratory at the beginning of July. The machine boasts 372 Nvidia Tesla M2090GPUs, and is ranked 159th most power supercomputer in the world, with 6,960 cores and capable of 114 teraflops. Researchers will use the supercomputer to tackle areas ranging from healthcare and astrophysics.

The final decision on what technologies to use to correlate the signals from the SKA has not been taken, and researchers are also investigating using supercomputers with an architecture that maximises the I/O of each node and customised Field Programmable Gate Arrays.

Powering up the SKA

After the data from the Square Kilometre Array has been correlated it will be processed and broken down – with about 10PB of data each day being stored. That work will take place at datacentres in South Africa and Australia, next to the two sites where the array telescopes will be situated. Data will then be sorted and packaged into smaller chunks and sent to regional datacentres for use in cosmological and physics research.

However, new technologies will need to be developed to handle the data from the SKA. If SKA was to rely on conventional computing architecture for its data processing and storage its main datacentres would likely draw down more than gigawatt of power each and require two dedicated power stations to operate.

Astron and IBM are engaged in a five-year €32.9m project to develop new types of processors, power supplies, storage systems and network technologies to handle the data without unmanageable power requirements.

Since moving data is one of the most power-hungry tasks carried out in computing, the research is largely focusing on technologies that can reduce the cost of moving data or reduce the amount of data that needs to be moved.

“Most of the energy goes into transportation of the data, rather than the processing of it. We have to look into designs so we transport data as little as possible,” said Boonstra.

The project is researching 3D stacking of chips to reduce the distance between processors and using photonic interconnects between processors to reduce power demands when funnelling large volumes of data.

Researchers are also looking to develop software that minimises the movement of data, for instance by aligning software and hardware so more operations can be carried out in one place.

With the right mix of technologies Boonstra believes the power demands for each main datacentre can be reduced to about 20MW or less.

There’s still a lot of work to be done but Boonstra said the payoff will be worth it, as the SKA will enable cosmological research into the origins of the cosmos that isn’t possible today.

“The SKA is expected to make a detailed map of the earliest phases of the universe, and that’s a region where no telescope has examined before, this is new science.”