Imperial College London
In many application domains, data are represented using large graphs involving millions of vertices and billions of edges. Graph exploration algorithms, such as Breadth-First Search (BFS), are largely dominated by memory latency and are challenging to process efficiently. In this paper, the authors present a reconfigurable hardware methodology for efficient parallel processing of large-scale graph exploration problems. Their methodology is based on a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the hardware capabilities of both FPGAs and the parallel memory subsystem.