Data Management

Coordinating Computation and I/O in Massively Parallel Sequence Search

Date Added: Jun 2010
Format: PDF

With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation- and data-intensive scientific applications. The authors' previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these run-time irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well studied independently, little attention has been paid to the interplay between the two. In this paper, they systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search.