SchedulingWorkflow Applications Based on Multi-Source Parallel Data Retrieval in Distributed Computing Networks
Many scientific experiments are carried out in collaboration with researchers around the world to use existing infrastructures and conduct experiments at massive scale. Data produced by such experiments are thus replicated and cached at multiple geographic locations. This gives rise to new challenges when selecting distributed data and compute resources so that the execution of applications is time- and cost-efficient. Existing heuristic techniques select 'Best' data source for retrieving data to a compute resource and subsequently process task-resource assignment.