An Empirical Study on Selective Sampling in Active Learning for Splog Detection

Download Now Free registration required

Executive Summary

This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in active learning by Support Vector Machines (SVMs). As a confidence measure of SVMs learning, the authors employ the distance from the separating hyper-plane to each test instance, which have been well studied in active learning for text classification.

  • Format: PDF
  • Size: 1011.9 KB