Data Management

Less is More: Selecting Sources Wisely for Integration

Date Added: Feb 2013
Format: PDF

The authors are often thrilled by the abundance of information surrounding users and wish to integrate data from as many sources as possible. However, understanding, analyzing, and using these data are often hard. Too much data can introduce a huge integration cost, such as expenses for purchasing data and resources for integration and cleaning. Furthermore, including low-quality data can even deteriorate the quality of integration results instead of bringing the desired quality gain. Thus, "The more the better" does not always hold for data integration and often "Less is more".