A Novel Approach to Perform Document Clustering Using Effectiveness and Efficiency of Simhash

Provided by: International Journal of Engineering and Advanced Technology (IJEAT)
Topic: Data Management
Format: PDF
Similarity is the most important feature of document clustering as the amount of web documents and the need of integrating documents from the huge multiple repositories, one of the challenging issues is to perform clustering of similar documents efficiently. A measure of the similarity between two patterns drawn from the same feature space is essential to most clustering procedures. From huge repositories, similar document identification for clustering is costly both in terms of space and time duration, and specially when finding near documents where documents could be added or deleted.

Find By Topic