Topic Modeling of Freelance Job Postings to Monitor Web Service Abuse
Source: University of California
Web services such as Google, Facebook, and Twitter are recurring victims of abuse, and their plight will only worsen as more attackers are drawn to their large user bases. Many attackers hire cheap, human labor to actualize their schemes, connecting with potential workers via crowd-sourcing and freelancing sites such as Mechanical Turk and Freelancer.com. To identify solicitations for abuse jobs, these Web sites need ways to distinguish these tasks from ordinary jobs. In this paper, the authors show how to discover clusters of abuse tasks using Latent Dirichlet Allocation (LDA), an unsupervised method for topic modeling in large corpora of text.