Unleash Your Memory-Constrained Applications: A 32-Node Non-Coherent Distributed-Memory Prototype Cluster
Improvements in hardware for parallel shared memory computing usually involve increments in the number of computing cores and in the amount of memory available for a given application. However, many shared-memory applications do not require more computing cores than available in current motherboards because their scalability is bounded to a few tens of parallel threads. They may still benefit from having more memory resources. In this paper the authors present a 32-node prototype of a new non-coherent distributed-memory architecture for clusters, aimed to provide applications additional memory borrowed from other nodes without providing them more cores, thus avoiding the penalty of maintaining coherency among nodes of the cluster.