Web-Scale k-Means Clustering

Date Added: Apr 2010
Format: PDF

The authors present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, they propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, they achieve sparsity with projected gradient descent, and give a fast ϵ- accurate projection onto the L1-ball.