Date Added: Jun 2010
In this paper, the authors leverage Cloud computing technologies in scaling out data management in geographical databases. In particular, they tackle the issue of data indexing in parallel. First, spatial data is partitioned and indexed in a Hadoop MapReduce cluster. Two main partitioning strategies are evaluated: a linear-complexity method based on Zorder values and an iterative algorithm based on X-means clustering. The advantages and drawbacks of each method are weighted in with relation to query performance. Second, interactive queries are processed from a local site using the index data structures built in the Cloud. They perform an experimental study on a real dataset of 110 million spatial objects representing property parcels in the United States.