An Improved Method for Classifying XML Documents Based on Structure and Content
Source: Academy Publisher
As more and more structured or semi-structured data is stored and exchanged in XML format, XML mining becomes increasingly important, especially the study of classification of XML documents becomes more widely. Considering the disadvantage of the current classification of XML documents that based on structure and content, this paper presents an improved method called NM-Similarity computing similarity measure, which maintains an high accuracy rate when XML documents are similar in structure but different in content. This method is applied in KNN (K-Nearest Neighbor) method for classification. The structure similarity between two XML documents is computed by using Euclidean distance, and the content similarity is computed by using Cosine measure.
| Format: | Size: | 161.90 | |
| Date: | Aug 2010 |



