Data Management

The pq-Gram Distance Between Ordered Labeled Trees

Download Now Date Added: Feb 2010
Format: PDF

When integrating data from autonomous sources, exact matches of data items that represent the same real-world object often fail due to a lack of common keys. Yet in many cases structural information is available and can be used to match such data. Typically the matching must be approximate since the representations in the sources differ. The authors propose pq-grams to approximately match hierarchical data from autonomous sources and define the pq-gram distance between ordered labeled trees as an effective and efficient approximation of the fanout weighted tree edit distance.