Exact Pattern Matching With Feed-Forward Bloom Filters

This paper presents a new, memory efficient and cache-optimized algorithm for simultaneously searching for a large number of patterns in a very large corpus. This algorithm builds upon the Rabin-Karp string search algorithm and incorporates a new type of Bloom filter that the authors call a feed-forward Bloom filter. While it retains the asymptotic time complexity of previous multiple pattern matching algorithms, they show that this technique, along with a CPU architecture aware design of the Bloom filter, can provide speedups between 2x and 30x, and memory consumption reductions as large as 50 when compared with grep.

Provided by: Carnegie Mellon University Topic: Data Management Date Added: Jan 2011 Format: PDF

Find By Topic