Download now Free registration required
The task of recognizing, in a data warehouse, records that pass on to the identical real world entity despite misspelling words, kinds, special writing styles or even unusual schema versions or data types is called as the record de-duplication. In existing research they offered a Genetic Programming (GP) approach to record de-duplication. Their approach combines several different parts of substantiation extracted from the data content to generate a de-duplication purpose that is capable to recognize whether two or more entries in a depository are duplications or not.
- Format: PDF
- Size: 392.71 KB