Download now Free registration required
To extract structural features from tree-structured documents among nodes in which characteristic words appear, the authors described a text-mining algorithm for enumerating all frequent Consecutive Path Patterns (CPP) on a list W of words in Uchida et al., PAKDD 2004. In this paper, they first extend a CPP to a tree pattern, which is called a Tree Association Pattern (TAP), over a set W of words. A TAP is an ordered rooted tree t such that the root of t has no child or at least two children, all leaves of t are labeled with non-empty subsets of W and all internal nodes, if they exist, are labeled with strings.
- Format: PDF
- Size: 1679.36 KB