Content Oriented Automatic Text Categorization
The paper is to implement a web spam classifier, which given a web page, will analyze its features and try to determine whether the page is spam or not. The efficiency of the classifier will be compared to the results spam detection in the text datasets using Naive Bayes classifier text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text.