Data Management

Probabilistic Declarative Information Extraction

Free registration required

Executive Summary

Unstructured text represents a large fraction of the world's data. It often contains snippets of structured information (e.g., people's names and zip codes). Information Extraction (IE) techniques identify such structured information in text. In recent years, database research has pursued IE on two fronts: Declarative languages and systems for managing IE tasks, and probabilistic databases for querying the output of IE. In this paper, the authors make the first step to merge these two directions, without loss of statistical robustness, by implementing a state-of-the-art statistical IE model - Conditional Random Fields (CRF) - in the setting of a Probabilistic Database that treats statistical models as first-class data objects.

  • Format: PDF
  • Size: 689.5 KB