Improving Load Balance and Query Throughput of Distributed IR Systems

Download Now Date Added: Jun 2010
Format: PDF

As the number of queries grows over time it becomes necessary that Information Retrieval (IR) system provides high query processing rate i.e. high query throughput. In IR systems, there are three types of data partitioning, namely term-based, document-based, and hybrid partitioning. In document-based and hybrid partitioning, query is sent to all nodes and thus high level of parallelism is achieved but low query throughput. In term-based partitioning, a given query is divided into sub-queries and each sub-query is directed to the relevant node. This provides high query throughput and concurrency but poor parallelism and load balance.