Data Management Investigate

Join Optimization of Information Extraction Output: Quality Matters!

Download now Free registration required

Executive Summary

Information Extraction (IE) systems are trained to extract specific relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce the data of interest. To optimize the execution of a join of multiple extracted relations, it is not sufficient to consider only execution time. In fact, the quality of the join output is of critical importance: unlike in the relational world, different join execution plans can produce join results of widely different quality whenever IE systems are involved. In this paper, the authors develop a principled approach to understand, estimate, and incorporate output quality into the join optimization process over extracted relations.

  • Format: PDF
  • Size: 415.9 KB