Graph-Parallel Entity Resolution using LSH & IMM

Provided by: Creative Commons
Topic: Data Management
Format: PDF
In this paper, the authors describe graph-based parallel algorithms for entity resolution that improve over the map-reduce approach. They compare two approaches to parallelize a Locality Sensitive Hashing (LSH) accelerated, Iterative Match-Merge (IMM) entity resolution technique: BCP (Bucket-Centric Parallelization), where records hashed together are compared at a single node/reducer, vs an alternative mechanism Record-Centric Parallelization (RCP) where comparison load is better distributed across processors especially in the presence of severely skewed bucket sizes. They analyze the BCP and RCP approaches analytically as well as empirically using a large synthetically generated dataset.

Find By Topic