With the rapid increase of data volume, more and more applications have to be implemented in a distributed environment. In order to obtain high performance, the authors need to carefully divide the whole dataset into multiple partitions and put them into distributed data nodes. During this process, the selection of partition key would greatly affect the overall performance. Nevertheless, there are few works addressing this topic. Most previous papers on data partitioning either utilize a simple strategy, or rely on a commercial database system, to choose partition keys.