Provided by: Carnegie Mellon University
Date Added: Apr 2013
In this paper the authors present a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs). Their scheme combines three key techniques: a new index data structure (entropy-coded tries); the use of sorting as the main data manipulation method; and support for incremental index construction for dynamic datasets. They evaluate their scheme by building an external dictionary on flash-based drives and demonstrate their scheme's high performance, compactness, and practicality.