Enriched Format Text Categorization Using A Component Similarity Approach
Source: Academy Publisher
Text categorization has been widely studied for years. However, conventional plain text categorization approaches which work good in plain text behave poor when they are simply applied to enriched format texts. An categorization approach that is applicable to enriched format text is proposed. During feature selection, the authors get feature structure distribution weight by using extended structure model so that structure affections to categorization are fully considered. Text formats are also taken into account in feature weighting. The combined feature weighting approach strengthens important parts and weakens less important ones.