Data Management

Adaptively Parallelizing Distributed Range Queries

Free registration required

Executive Summary

The authors consider the problem of how to best parallelize range queries in a massive scale distributed database. In traditional systems the focus has been on maximizing parallelism, for example by laying out data to achieve the highest throughput. However, in a massive scale database such as the authors' PNUTS system or BigTable, maximizing parallelism is not necessarily the best strategy: the system has more than enough servers to saturate a single client by returning results faster than the client can consume them, and when there are multiple concurrent queries, maximizing parallelism for all of them will cause disk contention, reducing everybody's performance.

  • Format: PDF
  • Size: 227.2 KB