Efficient Preprocessing and Patterns Identification Approach for Text Mining
Due to the rapid expansion of digital data, knowledge discovery and data mining have attracted significant amount of attention for turning such data into helpful information and knowledge. Text categorization is continuing to become the most researched NLP problems on account of the ever-increasing levels of electronic documents and digital libraries. The authors present a novel text categorization method that puts together the decision on multiple attributes. Since the most of existing text mining methods adopted term-based approaches, all of these are affected by the difficulties of polysemy and synonymy.