An Empirical Study on Selective Sampling in Active Learning for Splog Detection

Free registration required

Executive Summary

This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in active learning by Support Vector Machines (SVMs). As a confidence measure of SVMs learning, the authors employ the distance from the separating hyper-plane to each test instance, which have been well studied in active learning for text classification.

  • Format: PDF
  • Size: 1011.9 KB