pSHS: A Scalable Parallel Software Implementation of Montgomery Multiplication for Multicore Systems
Parallel programming techniques have become one of the great challenges in the transition from single-core to multicore architectures. In this paper, the authors investigate the parallelization of the Montgomery multiplication, a very common and timeconsuming primitive in public-key cryptography. A scalable parallel programming scheme, called pSHS, is presented to map the Montgomery multiplication to a general multicore architecture. The pSHS scheme offers a considerable speedup. Based on 2-, 4-, and 8-core systems, the speedup of a parallelized 2048-bit Montgomery multiplication is 1.98, 3.74, and 6.53, respectively. pSHS delivers stable performance, high portability, high throughput and low latency over different multicore systems.