Porting the MG-RAST Metagenomic Data Analysis Pipeline to the Cloud
Computational biology applications typically favor a local, cluster-based, integrated computational platform. The authors present a lessons learned report for scaling up a metagenomics application that had outgrown the available local cluster hardware. In their example, removing a number of assumptions linked to tight integration allowed one to expand beyond one administrative domain, increase the number and type of machines available for the application, and improve the scaling properties of the application. The assumptions made in designing the computational client make it well suited for deployment as a virtual machine inside a cloud. This paper discusses the decision process and describes the suitability of deploying various bioinformatics computations to distributed heterogeneous machines.