Download now Free registration required
Classification of email messages & web page content is essential to many tasks in web information retrieval such as maintaining email directories, web directories and focused crawling. The uncontrolled nature of email & web content presents additional challenges to their classification as compared to traditional text classification, but the common fields of email messages and interconnected nature of hypertext also provides features that can assist the process. As the authors review work in web page classification, they note the importance of these web-specific features and algorithms, - describe state of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.
- Format: PDF
- Size: 587.3 KB