A Novel XML Structural Similarity Calculation Method Based on Structural Information Content Using MapReduce

Provided by: AICIT
Topic: Big Data
Format: PDF
Structural similarity computation plays a crucial role in many applications. With the consideration of both nodes and edges of XML trees, the authors propose in this paper to exploit Structural Information Content (SIC) for measuring structural similarity. By recursively computing the SICs of all topological subtrees of pattern tree, they evaluate the structural similarity of data trees to pattern tree. To cope with large scale data of XML documents, they propose an efficient computation framework using MapReduce. Experiments demonstrated that the proposed method generated better similarity results and clustering results in comparison with some existing methods.

Find By Topic