Download now Free registration required
Information Extraction (IE) systems are trained to extract specific relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce the data of interest. To optimize the execution of a join of multiple extracted relations, it is not sufficient to consider only execution time. In fact, the quality of the join output is of critical importance: unlike in the relational world, different join execution plans can produce join results of widely different quality whenever IE systems are involved. In this paper, the authors develop a principled approach to understand, estimate, and incorporate output quality into the join optimization process over extracted relations.
- Format: PDF
- Size: 415.9 KB