Journal of Theoretical and Applied Information Technology
Duplicate and near duplicate web pages are stopping the process of search engine. As a consequence of duplicate and near duplicates, the common issue for the search engines is raising the indexed storage pages. This high storage memory will slow down the process which automatically increases the serving cost. Finally, the duplication will be raised while gathering the required data from the various sources based on the user's query. The duplication will definitely slow down the information retrieval process.