Constructing Inverted Files on a Cluster of Multicore Processors Near Peak I/O Throughput

Source: University of Maryland

Favorite

Free registration required

The authors develop a new strategy for processing a collection of documents on a cluster of multicore processors to build the inverted files at almost the peak I/O throughput of the underlying system. The algorithm is based on a number of novel techniques including: A high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers; a hybrid trie and B-tree dictionary data structure that enables efficient parallel construction of the global dictionary; and a partitioning strategy of the work of the indexers using random sampling, which achieve extremely good load balancing with minimal communication overhead.
Format:PDF Size:795.81
Date:Mar 2011