A Data Placement Strategy in Scientific Cloud Workflows
In scientific cloud workflows, large amounts of application data need to be stored in distributed data centers. To effectively store these data, a data manager must intelligently select data centers in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centers, the movement of large volumes of data becomes a challenge. In this paper, the authors propose a matrix based k-means clustering strategy for data placement in scientific cloud workflows.