International Journal of Modern Engineering Research (IJMER)
With the explosion of e-commerce and online communication and publishing, texts become available in a variety of genres like web search snippets, forum and chat messages, blogs, book and movie summaries, product descriptions and customer reviews. Successfully processing them, therefore, becomes increasingly important in many Web applications. However, matching, classifying, and clustering these sorts of text and web data pose new challenges. Unlike normal documents, these text and web segments are usually noisier, less topic-focused and much shorter, that is, they consist of from a dozen words to a few sentences.