An Adaptive Workflow Scheduling Scheme Based on an Estimated Data Processing Rate for Next Generation Sequencing in Cloud Computing
The cloud environment makes it possible to analyze large data sets in a scalable computing infrastructure. In the bioinformatics field, the applications are composed of the complex workflow tasks, which require huge data storage as well as a computing-intensive parallel workload. Many approaches have been introduced in distributed solutions. However, they focus on static resource provisioning with a batch processing scheme in a local computing farm and data storage. In the case of a large-scale workflow system, it is inevitable and valuable to outsource the entire or a part of their tasks to public clouds for reducing resource costs.