Optimizing Irregular Data Accesses for Cluster and Multi-Core Architectures
Applications with irregular accesses to shared state are one of the most challenging computational patterns in parallel computing. Accesses can involve both read or write operations, with writes having the additional complexity of requiring some form of synchronization. Irregular accesses perform poorly in local cached-based memory systems and across networks in global distributed memory settings, because they have poor spatial and temporal locality. Irregular accesses arises in transaction processing, in various system level programs, in computing histograms, performing sparse matrix operations, updating meshes in particle-mesh methods, and building adaptive unstructured meshes. Writing codes with asynchronous parallel updates on clusters and multi-core processors presents different sets of challenges.