Algorithm for Enumerating All Maximal Frequent Tree Patterns Among Words in Tree-Structured Documents and Its Application
Source: Science and Development Network (SciDev.Net)
To extract structural features from tree-structured documents among nodes in which characteristic words appear, the authors described a text-mining algorithm for enumerating all frequent Consecutive Path Patterns (CPP) on a list W of words in Uchida et al., PAKDD 2004. In this paper, they first extend a CPP to a tree pattern, which is called a Tree Association Pattern (TAP), over a set W of words. A TAP is an ordered rooted tree t such that the root of t has no child or at least two children, all leaves of t are labeled with non-empty subsets of W and all internal nodes, if they exist, are labeled with strings.
| Format: | Size: | 1679.36 | |
| Date: | Dec 2009 |



