Data Management

Parallel All Pairs Similarity Search

Free registration required

Executive Summary

In this paper, the authors present the first scalable parallel solution for the All Pairs Similarity Search (APSS) problem, which involves finding all pairs of data records that have a similarity score above the specified threshold. With exponentially growing datasets and modern multi-processor/multi-core system architectures, serial nature of all existing APSS solutions is the major rate limiting factor for applicability of APSS to large-scale real-world problems and calls for parallelization. Their proposed index sharing technique divides the APSS computation into independent searches over the central inverted index shared across all processors as a read-only data structure and achieves linear speed-up over the fastest serial APSS algorithm in shared memory environment.

  • Format: PDF
  • Size: 769.88 KB