Download Now Free registration required
This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in active learning by Support Vector Machines (SVMs). As a confidence measure of SVMs learning, the authors employ the distance from the separating hyper-plane to each test instance, which have been well studied in active learning for text classification.
- Format: PDF
- Size: 1011.9 KB