Association for Computing Machinery
One of the main goals of cloud and grid infrastructures is to make their services easily accessible and attractive to end-users. In this paper the authors investigate the problem of supporting keyword-based searching for the discovery of software files that are installed on the nodes of large-scale, federated grid and cloud computing infrastructures. They address a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on large-scale networked environments. They present Minersoft, a harvester that visits grid/cloud infrastructures, crawls their file systems, identifies and classifies software files, and discovers implicit associations between them.