Download now Free registration required
This paper presents a new, memory efficient and cache-optimized algorithm for simultaneously searching for a large number of patterns in a very large corpus. This algorithm builds upon the Rabin-Karp string search algorithm and incorporates a new type of Bloom filter that the authors call a feed-forward Bloom filter. While it retains the asymptotic time complexity of previous multiple pattern matching algorithms, they show that this technique, along with a CPU architecture aware design of the Bloom filter, can provide speedups between 2x and 30x, and memory consumption reductions as large as 50 when compared with grep.
- Format: PDF
- Size: 491.1 KB