Efficient Parallel Partition-based Algorithms for Similarity Search and Join with Edit Distance Constraints

The quantity of data in real-world applications is growing significantly while the data quality is still a big problem. Similarity search and similarity join are two important operations to address the poor data quality problem. Although many similarity search and join algorithms have been proposed, they did not utilize the abilities of modern hardware with multi-core processors. It calls for new parallel algorithms to enable multi-core processors to meet the high performance requirement of similarity search and join on big data. To this end, in this paper, the authors propose parallel algorithms to support efficient similarity search and join with edit-distance constraints.

Provided by: Association for Computing Machinery Topic: Data Centers Date Added: Mar 2013 Format: PDF

Download Now

Find By Topic