Association for Computing Machinery
The quantity of data in real-world applications is growing significantly while the data quality is still a big problem. Similarity search and similarity join are two important operations to address the poor data quality problem. Although many similarity search and join algorithms have been proposed, they did not utilize the abilities of modern hardware with multi-core processors. It calls for new parallel algorithms to enable multi-core processors to meet the high performance requirement of similarity search and join on big data. To this end, in this paper, the authors propose parallel algorithms to support efficient similarity search and join with edit-distance constraints.