Accurate Parallel Floating-Point Accumulation

Provided by: Institute of Electrical & Electronic Engineers
Topic: Hardware
Format: PDF
Using parallel associative reduction, iterative refinement, and conservative termination detection, the authors show how to use tree reduce parallelism to compute correctly rounded floating-point sums in O(logN) depth at arbitrary throughput. Their parallel solution shows how they can continue to exploit Moore's Law scaling in transistor count to accelerate floating-point performance even when clock rates remain flat. Empirical evidence suggests their iterative algorithm only requires two tree reduce passes to converge to the accurate sum in virtually all cases.

Find By Topic