Date Added: Jun 2011
RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. The authors concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. They propose optimizations in two directions. At the primitives' level, they study and improve the performance of an "Almost" Montgomery Multiplication. At the exponentiation level, they propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative.