Large-Scale Support Vector Machines: Algorithms and Theory
Support Vector Machines (SVMs) are a very popular method for binary classification. Traditional training algorithms for SVMs, such as chunking and SMO, scale superlinearly with the number of examples, which quickly becomes infeasible for large training sets. Since it has been commonly observed that dataset sizes have been growing steadily larger over the past few years, this necessitates the development of training algorithms that scale at worst linearly with the number of examples. The authors survey work on SVM training methods that target this large-scale learning regime. Most of these algorithms use either variants of primal Stochastic Gradient Descent (SGD), or quadratic programming in the dual.