A Survey Paper on Deduplication by Using Genetic Algorithm Alongwith Hash-Based Algorithm

In today's world, by increasing the volume of information available in digital libraries, most of the system may be affected by the existence of replicas in their warehouses. This is due to the fact that, clean and replica-free warehouse not only allow the retrieval of information which is of higher quality but also lead to more concise data and reduces computational time and resources to process this data. Here, the authors propose a genetic programming approach along with hash-based similarity i.e., with MD5 and SHA-1 algorithm.

Provided by: International Journal of Engineering Research and Applications (IJERA) Topic: Data Management Date Added: Jan 2014 Format: PDF

Find By Topic