An Optimized Approach of Modified BAT Algorithm to Record Deduplication

The task of recognizing, in a data warehouse, records that pass on to the identical real world entity despite misspelling words, kinds, special writing styles or even unusual schema versions or data types is called as the record de-duplication. In existing research they offered a Genetic Programming (GP) approach to record de-duplication. Their approach combines several different parts of substantiation extracted from the data content to generate a de-duplication purpose that is capable to recognize whether two or more entries in a depository are duplications or not.

Provided by: International Journal of Computer Applications Topic: Big Data Date Added: Jan 2013 Format: PDF

Find By Topic