Constructing Inverted Files on a Cluster of Multicore Processors Near Peak I/O Throughput
Source: University of Maryland
The authors develop a new strategy for processing a collection of documents on a cluster of multicore processors to build the inverted files at almost the peak I/O throughput of the underlying system. The algorithm is based on a number of novel techniques including: A high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers; a hybrid trie and B-tree dictionary data structure that enables efficient parallel construction of the global dictionary; and a partitioning strategy of the work of the indexers using random sampling, which achieve extremely good load balancing with minimal communication overhead.
| Format: | Size: | 795.81 | |
| Date: | Mar 2011 |



