Data Management

Active Learning for Crowd-Sourced Databases

Download Now Free registration required

Executive Summary

In this paper, the authors present algorithms for integrating machine learning algorithms for acquiring labeled data into crowd-sourced databases. The key observation is that there are a number of tasks for which humans and machine learning algorithms can be complementary, e.g., at labeling images where humans generally provide more accurate labels but are slow and expensive, while algorithms are usually less accurate but faster and cheaper. Based on this, they present two active learning algorithms designed to decide how to use humans and algorithms together in a crowd-sourced database. They look at two settings, namely the upfront and the iterative settings. In the upfront setting, they try to identify items that would be hard for algorithms to label, and ask humans to label them.

  • Format: PDF
  • Size: 949.23 KB