Serving up coverage of the London 2012 Olympic Games was one big data challenge the Press Association (PA) couldn’t afford to get wrong.
As the agency supplying Olympic coverage to both UK media and Games’ organiser Locog’s website it needed to supply accurate information rapidly and reliably without buckling under pressure.
The initial challenge for the agency was how to serve not only the results but also the contextual data like the previous form of an athlete and their picture, in order to present the breadth of information that audiences expect.
Each piece of PA coverage of the Games was tagged with metadata that provided contextual information, such as the names of athletes taking part in a particular event, and that could be queried by a computer.
PA’s Olympics content was stored in an XML repository provided by MarkLogic. A semantic repository system integrated with the XML store could be queried to pull out linked content based on metadata. The system also tied PA content to related information from the International Olympic Committee.
This setup allowed PA’s customers to choose the mix of information they wanted for their Olympics’ coverage by setting up requests via APIs.
To handle an average of 50,000 requests for information per second, PA used the Microsoft Azure cloud platform running in seven datacentres spread across the globe.
“We had hundreds of thousands of instances managing and caching information. It’s very cheap and effective, it’s .08p per hour and you can turn them on and off as you need them,” said John O’Donovan director of technical architecture and development at PA at the Cloud Expo Europe in London yesterday.
The switch to keeping content in an XML store, rather than storing it in a traditional relational database and then translating that data back into XML before sending it out to customers, had made the implementing the new system far simpler, said O’Donovan.
“The problem with that is having to sit down and design a relational database model that can represent everything that’s in the XML. That takes quite a lot of time, you have to build all of your input/output extenders and map XML objects into relational stores.”
Removing the relational database from the equation simplified the time it took to get the new content delivery system off the ground from 100 to 34 man days he said.
“There are very few areas where you can get that level of difference.”
The metadata system used during the Games is being extended to all of PA’s wire and output content and will go live during the next two months.