Solving Discrete Logarithms in Smooth-Order Groups with CUDA

In this paper, the authors chronicles their experiences using CUDA to implement a parallelized variant of Pollard's rho algorithm to solve discrete logarithms in groups with cryptographically large moduli but smooth order using commodity GPUs. They first discuss some key design constraints imposed by modern GPU architectures and the CUDA framework and then explain how they were able to implement efficient arbitrary-precision modular multiplication within these constraints. Their implementation can execute roughly 51.9 million 768-bit modular multiplications per second - or a whopping 840 million 192-bit modular multiplications per second - on a single Nvidia Tesla M2050 GPU card, which is a notable improvement over all previous results on comparable hardware.

Provided by: University of Waterloo Topic: Hardware Date Added: Mar 2012 Format: PDF

Find By Topic