An Improved Method for Classifying XML Documents Based on Structure and Content

Source: Academy Publisher

Favorite

Free registration required

As more and more structured or semi-structured data is stored and exchanged in XML format, XML mining becomes increasingly important, especially the study of classification of XML documents becomes more widely. Considering the disadvantage of the current classification of XML documents that based on structure and content, this paper presents an improved method called NM-Similarity computing similarity measure, which maintains an high accuracy rate when XML documents are similar in structure but different in content. This method is applied in KNN (K-Nearest Neighbor) method for classification. The structure similarity between two XML documents is computed by using Euclidean distance, and the content similarity is computed by using Cosine measure.
Format:PDF Size:161.90
Date:Aug 2010