Institute of Electrical & Electronic Engineers
In this paper, the authors study different schemes to parallelize trellis algorithms for efficient implementation on a GPU. They consider parallelization schemes at the packet-level, sub-block level and trellis-level to increase the number of threads in a GPU implementation. At the trellis-level, they consider state-level, forward-backward traversal and branch-metric parallelism. To evaluate the performance of the different schemes, an LTE uplink Turbo decoder is implemented on an NVIDIA GTX470 GPU. Tradeoffs between throughput, latency and bit error rate are presented.