An Optimized Implementation of N x N Parallel Decimal Multiplier Using CSA
In this paper, the authors introduce two novel architectures for parallel decimal multipliers. Their multipliers are based on a new algorithm for decimal carry – save multioperand addition that uses a novel BCD – 4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. Decimal floating point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput.