Evaluating the Similarity of XML Documents Based on Frequent Label Sequences

Provided by: AICIT
Topic: Big Data
Format: PDF
To efficiently and yet accurately cluster XML documents is of great interests to web users. Among different methods to address the problem, clustering XML documents based on the frequent patterns in the documents seems to be a novel, interesting one. The intuition of the clustering criterion is that documents within the same cluster share more common sequences, while those belonging to different clusters share fewer or nothing. In this paper, the authors introduce the method using the Frequent Label Sequences (FLS) as features to represent the XML documents and propose a new similarity metric to calculate the differences between documents based on the tag sequences.

Find By Topic