Sorting a list of input numbers is one of the most fundamental problems in the field of computer science in general and high-throughput database applications in particular. Although literature abounds with various flavors of sorting algorithms, different architectures call for customized implementations to achieve faster sorting times. In this paper, the authors present an efficient implementation and detailed analysis of MergeSort on current CPU architectures. Their SIMD implementation with 128-bit SSE is 3.3X faster than the scalar version. In addition, their algorithm performs an efficient multiway merge, and is not constrained by the memory bandwidth.